Hi Keith,

I don't think we’ve reached consensus on having NoSQL impl its own
top-level module in the main Polaris repo. We only floated MongoDB as a
possible single-module in the main repo, with a clear justification.

More broadly, I’m worried about packing every new persistence
implementation into the core repo. Each extra back-end means:

   - Extra CI cycles and breakage
   - Developers who now have to be familiar with 5-10 different storage
   engines. It isn't scalable given different DBs have different ways to
   handle things, e.g., transactions, consistency, or small things like id
   generation.
   - Divergent database quirks we have to paper over in shared code

Other OSS projects did similar things. For example, Iceberg ships Spark and
Flink in the main repo, while Trino lives in its own repo. That keeps
maintenance sustainable and lets contributors specialize. In fact, a lot of
successful OSS projects limit the “built-ins” to a handful of high-impact
targets, or even just one backend.

My suggestion:

   - Add new backends only when they deliver clear, high-value use cases.
   - Default to a separate repository unless there’s a strong reason to
   co-locate the code

Happy to discuss further, but we should be very careful of adding new
backends and focus on things adding more value to users, we don't want a
new OSS project like Polaris to struggle with maintenance cost with many
backends.

Yufei


On Wed, May 28, 2025 at 9:27 AM Keith Chapman <keithgchap...@gmail.com>
wrote:

> Hi Yufei,
>
> Thanks for putting this together, much appreciated. The document does
> highlight some useful design principles (such as not mixing business logic
> and leaking implementation details) which are true in general. These are
> not necessarily limited to persistence IMHO.
>
> We have discussed persistence before in many forums (slack, ML, community
> syncs etc), both in the context of JDBC and NoSQL. The consensus we had was
> it's better to have the code in the same repository (in a separate module).
> This can ensure that the code is well integrated into the project and
> ensures that end users can use it with ease rather than worrying about how
> to integrate etc. Your proposal (separate repo as a prefered method) is a
> deviation from that consensus.
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
>
> On Tue, May 27, 2025 at 5:35 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> > That location is fine, although I do not really see why contributing to
> > Persistence should be any different from contributing to other areas of
> > Polaris code.
> >
> > Cheers,
> > Dmitri.
> >
> > On Tue, May 27, 2025 at 7:03 PM Yufei Gu <flyrain...@gmail.com> wrote:
> >
> > > Dmitri, we could place it along with existing contribution guidelines,
> > > https://polaris.apache.org/community/contributing-guidelines/, but I'm
> > > open
> > > to suggestions.
> > >
> > > Yufei
> > >
> > >
> > > On Tue, May 27, 2025 at 2:10 PM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > > > Hi Yufei,
> > > >
> > > > I posted some comments in the doc.
> > > >
> > > > Where do you intend to publish it?
> > > >
> > > > Do we need a special process for Persistence code contributions on
> top
> > of
> > > > our general contribution guidelines?
> > > >
> > > > Thanks,
> > > > Dmitri.
> > > >
> > > > On Tue, May 27, 2025 at 4:30 PM Yufei Gu <flyrain...@gmail.com>
> wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > After meeting with a few folks from the community(JB, Dmitri,
> Keith,
> > > > > Russell, etc), I put together a short guidance doc for anyone who
> > wants
> > > > to
> > > > > add a new persistence back-end (think DynamoDB, Cassandra, etc.) to
> > > > > Polaris. The goal is to keep our persistence layer clean and
> > pluggable
> > > > > while avoiding surprises in the core codebase.
> > > > >
> > > > > Highlights
> > > > >
> > > > >    - Stay on the public APIs
> > > > >    - No business-logic bleed-through
> > > > >    - UnsupportedOperationException is okay when new API methods
> > appear
> > > > and
> > > > >    an older impl hasn’t caught up yet.
> > > > >
> > > > > Where does the code live?
> > > > >
> > > > >    - Preferred: its own repo (polaris-dynamodb, etc.) to keep the
> > main
> > > > repo
> > > > >    slim.
> > > > >    - If you must: a single self-contained module in the main repo,
> > > which
> > > > >    needs justification and zero cross-module leakage.
> > > > >    - Needing API tweaks? Post the proposal here first and let’s
> vote
> > if
> > > > >    it’s a big change.
> > > > >
> > > > > The full draft is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FEQ3f1XXKG_H7QFI-LN8lEkVljXoNNl2Bx4HVmj3UEI/edit?usp=sharing
> > > > > ,
> > > > > it’s only a page and a half.
> > > > >
> > > > > What I’m asking for
> > > > >
> > > > >    - Does the separation-of-concerns stance feel right?
> > > > >    - Are the API-change steps clear enough?
> > > > >
> > > > > I’ll fold in feedback and post a final version next week. Thanks in
> > > > advance
> > > > > for the eyes!
> > > > >
> > > > > Yufei
> > > > >
> > > >
> > >
> >
>

Reply via email to