As mentioned on the PR, I'm in favor of having concern-oriented APIs
that provide use-case oriented functionality without exposing and
force-pushing persistence implementation details upwards.
The proposed approach requires call sites to deal with the parent-child
and ID-based data modeling specifics. It is therefore not a database
agnostic approach. It also pushes a lot of persistence-only concerns to
call sites.
Instead of pushing persistence implementation specifics up to every call
site, I'd rather prefer use-case oriented APIs.
The approach also still requires implementing multiple non-persistence
use cases *in* each persistence implementation (bootstrap, principal
credentials, tasks, object-storage credential vending, `Resolver`
concerns). There are also a lot of `swich` expressions, which do not
satisfy the "easy evolution" claim, but actually prevent this. IMO the
proposal does not allow to easily add another type.
What also worries me, even with the existing model, is that failures are
communicated in the "Go" style, where each call site has to manually
check the results. This is error prone and should be replaced with both
Java's Optional and/or exceptions.
> Everything still works seamlessly, even with the remote metastore
manager.
I don't see a "*remote* metastore manager" in the Polaris code base.
It's maybe useful information for a particular private implementation,
but I do not think that it's useful information for Apache Polaris.
Technically speaking, this is still a proposal, but the PR is "ready for
review". IMO we should all strive to clearly indicate that PRs that are
part of an ongoing discussion should be set to "draft" and not "ready
for review". There are also still quite some "TODOs" in the code, so I
do not think that the change is "ready for review".
On 10.03.25 07:32, Yufei Gu wrote:
Thanks for chiming in. Agreed with JB that we don't need typeCode or
PolarisBaseEntity, we may still keep a root class/interface for Polaris
entities without any fields. So it's more flexible than PolarisBaseEntity.
This is the next step of the effort though. Agreed with Eric, that we
should couple entities together if they are essentially different.
I've also got some exciting news to share—the POC(
https://github.com/apache/polaris/pull/1128) passed all our tests! This is
a big first step forward in our journey towards clearly separating entities
within Polaris.
Here's what's great about this:
1. *100% Compatible with the current persistence layer: *Everything
still works seamlessly, even with the remote metastore manager. Different
database implementations can freely organize entities or methods in ways
that make sense for them. For example, we can still have common logic for
resolving multiple types of entities.
2. *Easy Evolution:* It's now much easier to add and evolve entities.
This recent improvement helps eliminate those unnecessary coupling like
tasks and tableLike. They are essentially different entities, but methods
like createMultipleEntities linked them together. This progress moves us
closer to our goal of fully separated entities.
3. *Next Steps:* We can start refactoring our business logic to leverage
individual DAO objects directly. Thanks to these recent changes, that task
should be straightforward. Another thing is to make return results
type-specific so that the persistence objects are completely separated from
the business objects. Most of the entity result should be easier to
refactor, with just one or two methods (like loadEntitiesChangeTracking)
needs a bit more work.
Please take a look at POC, let me know what you think.
Yufei
On Thu, Mar 6, 2025 at 10:31 AM Eric Maynard <eric.w.mayn...@gmail.com>
wrote:
I see the wisdom in Dennis's comments on encapsulation as it relates to
PolarisBaseEntity & inheritance, but I think we need to really examine what
a Principal and Table have in common. If there's really nothing shared,
then JB is right that we should carefully consider the need for
PolarisBaseEntity.
To me, it seems that the only similarity is that these entities "are
persisted" -- that is, they go into the metastore, the cache, and so on. I
would posit that they may not even be persisted in the same way. My
metastore implementation may choose to store tables in one DB and
principals in another. Or principals are stored in one tuple while tables
are stored across many.
Having a shared type and forcing the metastore APIs to strictly interact
with that type implies (if not enforces) a homogeneity that I'm not sure is
desirable.
--EM
--
Robert Stupp
@snazy