Re: Proposal to Separate Polaris Entities

Robert Stupp Tue, 18 Mar 2025 05:48:41 -0700

As mentioned on the PR, I'm in favor of having concern-oriented APIsthat provide use-case oriented functionality without exposing andforce-pushing persistence implementation details upwards.

The proposed approach requires call sites to deal with the parent-childand ID-based data modeling specifics. It is therefore not a databaseagnostic approach. It also pushes a lot of persistence-only concerns tocall sites.

Instead of pushing persistence implementation specifics up to every callsite, I'd rather prefer use-case oriented APIs.

The approach also still requires implementing multiple non-persistenceuse cases *in* each persistence implementation (bootstrap, principalcredentials, tasks, object-storage credential vending, `Resolver`concerns). There are also a lot of `swich` expressions, which do notsatisfy the "easy evolution" claim, but actually prevent this. IMO theproposal does not allow to easily add another type.

What also worries me, even with the existing model, is that failures arecommunicated in the "Go" style, where each call site has to manuallycheck the results. This is error prone and should be replaced with bothJava's Optional and/or exceptions.

> Everything still works seamlessly, even with the remote metastoremanager.

I don't see a "*remote* metastore manager" in the Polaris code base.It's maybe useful information for a particular private implementation,but I do not think that it's useful information for Apache Polaris.

Technically speaking, this is still a proposal, but the PR is "ready forreview". IMO we should all strive to clearly indicate that PRs that arepart of an ongoing discussion should be set to "draft" and not "readyfor review". There are also still quite some "TODOs" in the code, so Ido not think that the change is "ready for review".



On 10.03.25 07:32, Yufei Gu wrote:

Thanks for chiming in. Agreed with JB that we don't need typeCode or
PolarisBaseEntity, we may still keep a root class/interface for Polaris
entities without any fields. So it's more flexible than PolarisBaseEntity.
This is the next step of the effort though. Agreed with Eric, that we
should couple entities together if they are essentially different.

I've also got some exciting news to share—the POC(
https://github.com/apache/polaris/pull/1128) passed all our tests! This is
a big first step forward in our journey towards clearly separating entities
within Polaris.
Here's what's great about this:

    1. *100% Compatible with the current persistence layer: *Everything
    still works seamlessly, even with the remote metastore manager. Different
    database implementations can freely organize entities or methods in ways
    that make sense for them. For example, we can still have common logic for
    resolving multiple types of entities.
    2. *Easy Evolution:* It's now much easier to add and evolve entities.
    This recent improvement helps eliminate those unnecessary coupling like
    tasks and tableLike. They are essentially different entities, but methods
    like createMultipleEntities linked them together. This progress moves us
    closer to our goal of fully separated entities.
    3. *Next Steps:* We can start refactoring our business logic to leverage
    individual DAO objects directly. Thanks to these recent changes, that task
    should be straightforward. Another thing is to make return results
    type-specific so that the persistence objects are completely separated from
    the business objects. Most of the entity result should be easier to
    refactor, with just one or two methods (like loadEntitiesChangeTracking)
    needs a bit more work.

Please take a look at POC, let me know what you think.

Yufei


On Thu, Mar 6, 2025 at 10:31 AM Eric Maynard <eric.w.mayn...@gmail.com>
wrote:

I see the wisdom in Dennis's comments on encapsulation as it relates to
PolarisBaseEntity & inheritance, but I think we need to really examine what
a Principal and Table have in common. If there's really nothing shared,
then JB is right that we should carefully consider the need for
PolarisBaseEntity.

To me, it seems that the only similarity is that these entities "are
persisted" -- that is, they go into the metastore, the cache, and so on. I
would posit that they may not even be persisted in the same way. My
metastore implementation may choose to store tables in one DB and
principals in another. Or principals are stored in one tuple while tables
are stored across many.

Having a shared type and forcing the metastore APIs to strictly interact
with that type implies (if not enforces) a homogeneity that I'm not sure is
desirable.

--EM

--
Robert Stupp
@snazy

Re: Proposal to Separate Polaris Entities

Reply via email to