> > DAO is rather specific to JPA - so it is in practice specific to > relational databases.
That's not true. DAO could be a pure Java interface without JPA as I mentioned above. Also my POC and design intentionally avoid any JPA related APIs. In fact, both implementers in the POC are non-JPA ones. Yufei On Tue, Feb 18, 2025 at 5:02 AM Robert Stupp <sn...@snazy.de> wrote: > Reminder: not all databases, especially NoSQL databases, are usable with > JPA. DAO is rather specific to JPA - so it is in practice specific to > relational databases. Further, DAOs, if implemented _like_ JPA, require > a ton of logic to be implemented. > > I strongly prefer simple and database agnostic interfaces and not > something that only works for relational databases or FDB. > > For catalog operations (as proposed in this gdoc comment > > https://docs.google.com/document/d/1LlNhEy4cBjjE_um694fcsnizqd3rDm5pbewXkLxvu1o/edit?disco=AAABcK1NcYk > ) > > interface CatalogPersistence { > CatalogState getCurrentState(); > R commitTransactional(Updater<R> updater, Class<R> resultClass); > } > > The first function gives you a consistent state (think: snapshot of > everything) of the catalog as of "now" and the second one allows you to > update the state. > > CatalogState could look like this: > > interface CatalogState { > Stream<Identifier> listChildren(Identifier parent); > Optional<NamespaceObject> getNamespace(Identifier identifier); > C getContent(Identifier identifier, Class<C> clazz); > Optional<ResolvedCatalogPath<OBJ>> resolveCatalogPath( > Identifier identifier, Class<OBJ> expectedLeafType) > } > > `Updater` could a functional interface with a single "RESULT > update(UpdateState state)" function. > > `CatalogUpdateState` could have with function to record intended changes: > > interface CatalogUpdateState extends CatalogState { > <C> void recordChange( > Identifier identifier, > C content, > Class<C> clazz, > Change change); > } > > If `update(CatalogUpdateState)` returns successfully, all changes to the > catalog are persisted atomically and the result propagated to the > caller. If `update()` throws, it's an error and the catalog state > doesn't change. > > Combined with a retry-mechanism, `update()` could also retry in case of > (resolvable) conflicts. > > > For other use cases we can get away with an interface like this: > > public interface Persistence { > <T> T fetch(ObjId id, Class<T> clazz); > // write w/ immedite consistency guarantee > <T> T write(T obj, Class<T> clazz); > // write new row if not yet exists > <T> T conditionalInsert(T obj, Class<T> clazz); > // update row if "expected state marker" matches > <T> T conditionalUpdate(T expected, T update, Class<T> clazz); > > // plus lower-level functions to enable atomic catalog state updates > } > > > On 18.02.25 07:41, Jean-Baptiste Onofré wrote: > > Yes I know (remember I proposed to use DAO so I know what it is :)). > > If it’s clear for everyone, ok for DAO (just wanted to be sure that it’s > > clear for the implementers). > > > > So ok to have CatalogDao for operations and CatalogRecord for carriage > data > > (for instance). > > > > Thanks > > Regards > > JB > > > > Le mar. 18 févr. 2025 à 02:11, Yufei Gu <flyrain...@gmail.com> a écrit : > > > >> Hi JB, > >> > >> DAO is often defined as an interface, used exactly for operations. Why > do > >> we need another name for that? See the examples here > >> <https://www.geeksforgeeks.org/data-access-object-pattern/>, it can be > >> with > >> JPA or non-JPA. In our case, it has to be compatible with non-JPA > >> implementation. > >> > >> interface DeveloperDao { > >> public List<Developer> getAllDevelopers(); > >> public Developer getDeveloper(int DeveloperId); > >> public void updateDeveloper(Developer Developer); > >> public void deleteDeveloper(Developer Developer); > >> } > >> > >> public interface ProductDAO extends JpaRepository<Product, Long> { > >> // Additional custom queries or methods can be declared here > >> } > >> > >> Yufei > >> > >> > >> On Mon, Feb 17, 2025 at 10:58 AM Jean-Baptiste Onofré <j...@nanthrax.net> > >> wrote: > >> > >>> Hi Yufei > >>> > >>> I agree with you. My only concern with CatalogDAO and NamespaceDAO in > the > >>> PoC is that it could not be obvious for implementer. > >>> Being very abstract force us to clearly decouple service and > persistence. > >>> One interface or multiple interfaces is totally fine for me (for the > >>> storage operations). I just think that we have to be clear in the DAO. > >>> > >>> It’s what I proposed and discussed with Jack to also get his expertise > >> for > >>> DynamoDB. > >>> > >>> So, in the PoC, I would: > >>> 1. Clearly define DAO, with enforcement on the data transport (like > using > >>> Java records). Maybe it’s more a naming clarification. We can have > >>> CatalogRecord (Java records) and CatalogStoreOperations (instead of > >>> CatalogDAO and using CatalogRecord) > >>> 2. Split store operations for a backend in separate modules (one for > FDB, > >>> one for Postgre) > >>> > >>> Thoughts ? > >>> > >>> Regards > >>> JB > >>> > >>> Le lun. 17 févr. 2025 à 18:36, Yufei Gu <flyrain...@gmail.com> a > écrit : > >>> > >>>> Hi JB, > >>>> > >>>> The Java objects you mentioned should either belong to the > >>> service/business > >>>> layer or be unnecessary. Apart from the business entities in Polaris, > >>> each > >>>> backend will maintain its own set of persistence-layer entities. The > >>>> backend implementations (whether Postgres, FDB, or MongoDB) will be > >>>> responsible for mapping between these persistence-layer entities and > >> the > >>>> business entities. Currently, the business entity set is tightly > >> coupled > >>>> with FDB, which is not ideal for supporting multiple backends. While > we > >>> can > >>>> refactor them, creating an entirely new set of business entities is > not > >>>> required. > >>>> > >>>> Regarding the PolarisStore you mentioned, it functions similarly to > the > >>> DAO > >>>> layer in the POC. It'd make more sense to split its responsibilities > >> into > >>>> separate interfaces rather than combining them into a single one. A > >>> single > >>>> interface violates the Single Responsibility Principle and may lead to > >>>> maintenance challenges as we introduce new data access patterns. For > >>>> instance, we would be forced to continuously add new methods to > >>> accommodate > >>>> additional entities like Policy and Volume, or to support new query > >>>> patterns for existing entities. > >>>> > >>>> As for choosing between record or immutable classes, either is fine. > >> But > >>>> I'd prefer to use any native Java feature if possible. > >>>> > >>>> > >>>> > >>>> Yufei > >>>> > >>>> > >>>> On Mon, Feb 17, 2025 at 9:10 AM Michael Collado < > >> collado.m...@gmail.com> > >>>> wrote: > >>>> > >>>>> I prefer native Java constructs over third party libraries and > >> compiler > >>>>> plugins, whenever possible. I’m a fan of Java records. > >>>>> > >>>>> Mike > >>>>> > >>>>> On Mon, Feb 17, 2025 at 6:55 AM Jean-Baptiste Onofré < > >> j...@nanthrax.net> > >>>>> wrote: > >>>>> > >>>>>> Hi Robert, > >>>>>> > >>>>>> Yes, my intention about Java records (e.g. > >>>>>> https://openjdk.org/jeps/395) is to leverage: > >>>>>> - Devise an object-oriented construct that expresses a simple > >>>>>> aggregation of values. > >>>>>> - Focus on modeling immutable data rather than extensible behavior. > >>>>>> - Automatically implement data-driven methods such as equals and > >>>>> accessors. > >>>>>> - Preserve long-standing Java principles such as nominal typing and > >>>>>> migration compatibility. > >>>>>> - Provide inner builder, optionally with technical validation > >>>>>> (Objects.NotNull, etc), for instance: > >>>>>> > >>>>>> public record CatalogDAO(String id, String name, ...) { > >>>>>> > >>>>>> public static final class Builder { > >>>>>> String id; > >>>>>> String name; > >>>>>> public Builder() {} > >>>>>> public Builder id(String id) { > >>>>>> this.id = id; > >>>>>> return this; > >>>>>> } > >>>>>> public Builder name(String name) { > >>>>>> this.name = name; > >>>>>> return this; > >>>>>> } > >>>>>> public CatalogDAO build() { > >>>>>> return new CatalogDAO(id, name); > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> } > >>>>>> NB: that's just a "raw" example :) > >>>>>> > >>>>>> That could help us for the "DAO" layer, with clean isolation and > >> data > >>>>>> "transport". The "conversion" from Polaris Entity to DAO could be > >>>>>> performed by the intermediate logic layer (e.g. > >>>>>> PolarisMetaStoreManager), the pure persistence layer can deal with > >>> DAO > >>>>>> only (e.g. PolarisStore). > >>>>>> > >>>>>> Good point about immutable collections. In some projects, I "force" > >>>>>> the copy of a collection to ensure immutability, something like: > >>>>>> > >>>>>> record NamespaceDAO(Set<String> children) { > >>>>>> public NamespaceDAO { > >>>>>> children = Set.copyOf(children); > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> OK, that's not super elegant :) but it does the trick ;) > >>>>>> > >>>>>> Regards > >>>>>> JB > >>>>>> > >>>>>> On Mon, Feb 17, 2025 at 5:28 PM Robert Stupp <sn...@snazy.de> > >> wrote: > >>>>>>> Agree with JB, except I think "immutables" serves the need much > >>>> better. > >>>>>>> Java records are okay, but do not ensure that all fields are > >>>> immutable > >>>>> - > >>>>>>> especially collections. The other pro of immutables is that there > >>> are > >>>>>>> proper, descriptive builders - hence no need to constructors > >> with a > >>>>>>> gazillion parameters. > >>>>>>> > >>>>>>> On 17.02.25 11:42, Jean-Baptiste Onofré wrote: > >>>>>>>> Hi Yufei > >>>>>>>> > >>>>>>>> I left comments in the PR. > >>>>>>>> > >>>>>>>> Thanks for that. That's an interesting approach but slightly > >>>>> different > >>>>>>>> to what I had in mind. > >>>>>>>> > >>>>>>>> My proposal was: > >>>>>>>> > >>>>>>>> 1. The DAOs are Java records clearly descriptive, without > >>> "storage > >>>>>> operations". > >>>>>>>> For instance, we can imagine: > >>>>>>>> > >>>>>>>> public record CatalogDao(String id, String name, String > >> location, > >>>>> ...) > >>>>>> { > >>>>>>>> ... > >>>>>>>> } > >>>>>>>> > >>>>>>>> public record NamespaceDao(String id, String name, String > >> parent, > >>>>> ...) > >>>>>> {} > >>>>>>>> public record TableDao(String id, String name, ...) {} > >>>>>>>> > >>>>>>>> etc > >>>>>>>> > >>>>>>>> The advantages are: > >>>>>>>> - very simple and descriptive > >>>>>>>> - common to any backend implementation > >>>>>>>> > >>>>>>>> 2. The PolarisStore defines the DAO backend operations and > >>> mapping > >>>> to > >>>>>>>> "internal" Polaris entities. Each persistence adapter > >> implements > >>>> the > >>>>>>>> PolarisStore. > >>>>>>>> For the persistence adapter implementer, he just has to > >> implement > >>>> the > >>>>>>>> DAO persistence (no need to understand other Polaris parts). > >>>>>>>> Each persistence adapter is in its own module (for isolation > >> and > >>>>>> dependencies). > >>>>>>>> During the Polaris Persistence Meeting last week, we already > >> got > >>>>>>>> consensus on the approach proposed by Dennis and I. I propose > >> to > >>>> do a > >>>>>>>> new sync/review with Dennis. > >>>>>>>> > >>>>>>>> Regards > >>>>>>>> JB > >>>>>>>> > >>>>>>>> On Mon, Feb 17, 2025 at 9:40 AM Yufei Gu <flyrain...@gmail.com > >>>>> wrote: > >>>>>>>>> Hi Folks: > >>>>>>>>> > >>>>>>>>> Here is a POC for persistence layer refactor. Please check it > >>> out > >>>>> and > >>>>>> let > >>>>>>>>> me know what you think. Please note this is a POC, we still > >>> need a > >>>>>> lot of > >>>>>>>>> effort to complete the refactor. > >>>>>>>>> > >>>>>>>>> PR: https://github.com/apache/polaris/pull/1011. > >>>>>>>>> Design doc: > >>>>>>>>> > >> > https://docs.google.com/document/d/1Vuhw5b9-6KAol2vU3HUs9FJwcgWtiVVXMYhLtGmz53s/edit?tab=t.0 > >>>>>>>>> Experiment: > >>>>>>>>> > >>>>>>>>> - Added a DAO layer for the business entity > >> namespace(except > >>>> the > >>>>>> read). > >>>>>>>>> - Integrated with existing DAO components > >>>>>> (PolarisMetaStoreManager and > >>>>>>>>> PolarisMetaStoreSession). > >>>>>>>>> - All tests passed successfully, including a manual local > >>> run > >>>>>> with Spark > >>>>>>>>> sql. > >>>>>>>>> > >>>>>>>>> Benefits: > >>>>>>>>> > >>>>>>>>> - Compatible with the existing backend(FDB), as we hide it > >>>>> behind > >>>>>> the > >>>>>>>>> new DAO. > >>>>>>>>> - Adding new backends(Postgres/MongoDB) is much easier > >> now, > >>>> esp > >>>>>> for > >>>>>>>>> Postgres, we could be able to use a similar model as > >> Iceberg > >>>>> Jdbc > >>>>>> catalog. > >>>>>>>>> - Allows gradual refactoring to remove old DAO > >> dependencies > >>>>>>>>> (PolarisMetaStoreManager and PolarisMetaStoreSession). > >>>>>>>>> - Enables parallel development of new backend > >>> implementations. > >>>>>>>>> Next Steps: > >>>>>>>>> > >>>>>>>>> - Define business entities one by one to decouple them > >> from > >>>> FDB. > >>>>>>>>> - Create DAO interfaces for each entity to standardize > >>>>> operations > >>>>>> (e.g., > >>>>>>>>> CRUD, ID generation). > >>>>>>>>> - Expand DAO implementations to support additional > >> backends > >>>> over > >>>>>> time. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Yufei > >>>>>>> -- > >>>>>>> Robert Stupp > >>>>>>> @snazy > >>>>>>> > -- > Robert Stupp > @snazy > >