I just filed https://github.com/apache/polaris/issues/540 to track this
feature request, and wrote up this document to try to better solidify our
terminology and concepts on related federation features, while also
sketching out a proposal for a basic MVP of catalog federation that could
be easily evolved/expanded over time:

https://docs.google.com/document/d/1Q6eEytxb0btpOPcL8RtkULskOlYUCo_3FLvFRnHkzBY/edit?tab=t.0

I agree with Robert that it could be worth initially focusing on
integrating with Iceberg REST Catalogs as the "cleanest" initial way
forward, while ensuring our model can still accommodate federation to other
catalog flavors (especially Hive).

I tried to think through some of the tradeoffs for whether entities should
be fully "implicit" vs requiring facade entities (more details about
terminology in the doc), and captured them in the doc; it seems we should
be able to simplify the requisite code changes by doing JIT-creation of
facade entities (and preserve an internally-consistent metadata model of
GrantRecords on entities), while still getting the benefits of making it
feel like a simple "pane of glass" that plumbs through to the remote
catalog.

Any feedback/comments would be much appreciated!

On Tue, Nov 5, 2024 at 9:33 AM Michael Collado <collado.m...@gmail.com>
wrote:

> When I said "migrate", I specifically meant the feature to actually move
> the tables to a new source catalog. Polaris as a _path_ to migration
> definitely should be a core feature - it's the reason External catalogs are
> supported at all. Adding full delegation, IMO, helps fulfill the purpose of
> External catalog support, so I'm fully in favor. I do think we ought to be
> picky about which catalogs we explicitly add support for - some of them are
> going to have way more impact than others (e.g., Hive).
>
> Mike
>
> On Tue, Nov 5, 2024 at 8:32 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
> > I think it's really important that we have the functionality to move
> folks
> > off of existing catalog solutions be they Hive, Hadoop or what not. I do
> > think, even with the dependency issues, this should be a core
> functionality
> > of Polaris. I think this would make it much easier for folks on non-REST
> > catalogs to migrate to a REST solution. Even if we have a true "migrate"
> > functionality punted till later, it would be great if we could provide at
> > least a path towards REST completely within Polaris. I know that for us
> to
> > federate with some systems like One Lake that don't have a REST endpoint,
> > this will be a requirement.
> >
> > On Thu, Oct 31, 2024 at 12:15 PM Michael Collado <collado.m...@gmail.com
> >
> > wrote:
> >
> > > IMO, anything that's not directly managed by Polaris source code is
> > > external. If someone can create a table directly in the other catalog
> and
> > > it shows up in Polaris, it's external.
> > >
> > > As of now, all of the direct manipulation APIs are blocked for external
> > > catalogs - createTable, updateTable, etc. I think that's something we
> can
> > > change so that if we *can* delegate to a remote catalog, we should. In
> > such
> > > a case, the privileges defined in Polaris should be applied (though, of
> > > course, we can't guarantee that they're applied by the remote catalog).
> > >
> > > Credential vending rules should apply according to the External catalog
> > > rules - if they're disabled for external catalogs, they can be enabled
> > on a
> > > per-catalog basis.
> > >
> > > "migrate" would be a cool feature, but I'd put it low on the priority
> > list.
> > >
> > > Those are my thoughts, anyway.
> > >
> > > On Thu, Oct 31, 2024 at 9:22 AM Russell Spitzer <
> > russell.spit...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Y’all,
> > > >
> > > > Some of us at Snowflake and Revolut have been talking a bit about how
> > > > Apache Polaris can be used in conjunction with older implementations
> of
> > > > Apache Iceberg Catalogs. At Revolut, they have already built a
> version
> > of
> > > > this which allows them to use Polaris Capabilities on top of an old
> > > Catalog
> > > > implementation and give them a migration path towards moving towards
> a
> > > 100%
> > > > Polaris solution in the future. Those of us at Snowflake were
> thinking
> > > that
> > > > this would be a great capability for Polaris to natively support so
> we
> > > were
> > > > hoping to bring up the topic with the greater Apache Polaris
> community
> > to
> > > > see if other folks are interested in this and to help us sus out what
> > > > questions we need to answer to fit into everyone’s use cases.
> > > >
> > > > To start off the conversation here are a few questions I think we
> need
> > to
> > > > answer:
> > > >
> > > > 1. Do we consider Iceberg Catalogs wrapped by the Polaris catalog as
> > > > External or part of Polaris?
> > > > 2. What should we pull to Polaris for policy application? Namespace,
> > > > Tables? How often should this get listed?
> > > > 3. Should we attempt any type of credential forwarding or should we
> > just
> > > > assume Polaris gets full access to the wrapped Catalog?
> > > > 4. Should we provide a “migrate” functionality that allows someone to
> > > “end”
> > > > the federation and move to a solely Polaris architecture?
> > > >
> > > > I’m sure there are more but I want to get the ball rolling so the
> folks
> > > at
> > > > Revolut can work on a proposal with the community already somewhat in
> > > sync
> > > > about requirements/ideas. I'm linking to a proposal
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/17GPe_qawoVlEJe3qwGKgMJlx2CYU97-Jz_npTy_b8_k/edit?tab=t.0#heading=h.nuhm7bltkglt
> > > > >
> > > > from @Almaz Galiev <almaz.gal...@revolut.com> to start thinking
> about
> > > this
> > > >
> > > >
> > > > Thanks for your time as always,
> > > > Russ
> > > >
> > >
> >
>

Reply via email to