On 17.02.25 18:36, Yufei Gu wrote:
Hi JB,

The Java objects you mentioned should either belong to the service/business
layer or be unnecessary. Apart from the business entities in Polaris, each
backend will maintain its own set of persistence-layer entities. The
backend implementations (whether Postgres, FDB, or MongoDB) will be
responsible for mapping between these persistence-layer entities and the
business entities. Currently, the business entity set is tightly coupled
with FDB, which is not ideal for supporting multiple backends. While we can
refactor them, creating an entirely new set of business entities is not
required.

I disagree that we do not need an entirely new set of business objects. The current entities expose persistence implementation _details_ all the way through business logic and public APIs to clients. Even "well known" properties are not type-safe and require additional re-serializations ("JSON in JSON in JSON"), which is IMO just not necessary.

As mentioned by JB, we already got a consensus during the meeting last week and I think the model discussed during the meeting is definitely worth a try.

Regarding the PolarisStore you mentioned, it functions similarly to the DAO
layer in the POC. It'd make more sense to split its responsibilities into
separate interfaces rather than combining them into a single one. A single
interface violates the Single Responsibility Principle and may lead to
maintenance challenges as we introduce new data access patterns. For
instance, we would be forced to continuously add new methods to accommodate
additional entities like Policy and Volume, or to support new query
patterns for existing entities.

As for choosing between record or immutable classes, either is fine. But
I'd prefer to use any native Java feature if possible.



Yufei


On Mon, Feb 17, 2025 at 9:10 AM Michael Collado <collado.m...@gmail.com>
wrote:

I prefer native Java constructs over third party libraries and compiler
plugins, whenever possible. I’m a fan of Java records.

Mike

On Mon, Feb 17, 2025 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

Hi Robert,

Yes, my intention about Java records (e.g.
https://openjdk.org/jeps/395) is to leverage:
- Devise an object-oriented construct that expresses a simple
aggregation of values.
- Focus on modeling immutable data rather than extensible behavior.
- Automatically implement data-driven methods such as equals and
accessors.
- Preserve long-standing Java principles such as nominal typing and
migration compatibility.
- Provide inner builder, optionally with technical validation
(Objects.NotNull, etc), for instance:

public record CatalogDAO(String id, String name, ...) {

   public static final class Builder {
       String id;
       String name;
       public Builder() {}
       public Builder id(String id) {
        this.id = id;
        return this;
      }
      public Builder name(String name) {
        this.name = name;
        return this;
      }
      public CatalogDAO build() {
        return new CatalogDAO(id, name);
      }
   }

}
  NB: that's just a "raw" example :)

That could help us for the "DAO" layer, with clean isolation and data
"transport". The "conversion" from Polaris Entity to DAO could be
performed by the intermediate logic layer (e.g.
PolarisMetaStoreManager), the pure persistence layer can deal with DAO
only (e.g. PolarisStore).

Good point about immutable collections. In some projects, I "force"
the copy of a collection to ensure immutability, something like:

record NamespaceDAO(Set<String> children) {
     public NamespaceDAO {
         children = Set.copyOf(children);
     }
}

OK, that's not super elegant :) but it does the trick ;)

Regards
JB

On Mon, Feb 17, 2025 at 5:28 PM Robert Stupp <sn...@snazy.de> wrote:
Agree with JB, except I think "immutables" serves the need much better.
Java records are okay, but do not ensure that all fields are immutable
-
especially collections. The other pro of immutables is that there are
proper, descriptive builders - hence no need to constructors with a
gazillion parameters.

On 17.02.25 11:42, Jean-Baptiste Onofré wrote:
Hi Yufei

I left comments in the PR.

Thanks for that. That's an interesting approach but slightly
different
to what I had in mind.

My proposal was:

1. The DAOs are Java records clearly descriptive, without "storage
operations".
For instance, we can imagine:

public record CatalogDao(String id, String name, String location,
...)
{
...
}

public record NamespaceDao(String id, String name, String parent,
...)
{}
public record TableDao(String id, String name, ...) {}

etc

The advantages are:
- very simple and descriptive
- common to any backend implementation

2. The PolarisStore defines the DAO backend operations and mapping to
"internal" Polaris entities. Each persistence adapter implements the
PolarisStore.
For the persistence adapter implementer, he just has to implement the
DAO persistence (no need to understand other Polaris parts).
Each persistence adapter is in its own module (for isolation and
dependencies).
During the Polaris Persistence Meeting last week, we already got
consensus on the approach proposed by Dennis and I. I propose to do a
new sync/review with Dennis.

Regards
JB

On Mon, Feb 17, 2025 at 9:40 AM Yufei Gu <flyrain...@gmail.com>
wrote:
Hi Folks:

Here is a POC for persistence layer refactor. Please check it out
and
let
me know what you think. Please note this is a POC, we still need a
lot of
effort to complete the refactor.

PR: https://github.com/apache/polaris/pull/1011.
Design doc:

https://docs.google.com/document/d/1Vuhw5b9-6KAol2vU3HUs9FJwcgWtiVVXMYhLtGmz53s/edit?tab=t.0
Experiment:

     - Added a DAO layer for the business entity namespace(except the
read).
     - Integrated with existing DAO components
(PolarisMetaStoreManager and
     PolarisMetaStoreSession).
     - All tests passed successfully, including a manual local run
with Spark
     sql.

Benefits:

     - Compatible with the existing backend(FDB), as we hide it
behind
the
     new DAO.
     - Adding new backends(Postgres/MongoDB) is much easier now, esp
for
     Postgres, we could be able to use a similar model as Iceberg
Jdbc
catalog.
     - Allows gradual refactoring to remove old DAO dependencies
     (PolarisMetaStoreManager and PolarisMetaStoreSession).
     - Enables parallel development of new backend implementations.

Next Steps:

     - Define business entities one by one to decouple them from FDB.
     - Create DAO interfaces for each entity to standardize
operations
(e.g.,
     CRUD, ID generation).
     - Expand DAO implementations to support additional backends over
time.



Yufei
--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

Reply via email to