Hey all!

Totally agree going forward that metastore level access would be much more
useful and correct for replicating than the Polaris APIs, and that some
sort of import/export/polaris_dump functionality would open up a greater
variety of use cases in the future. I don't think these have to be mutually
exclusive with the tool in its current state, however. With the tool
designed the way it is right now, we can essentially think of it as taking
Polaris entities from a source, and creating those entities in a target
destination. While it only supports Polaris API to API migrations right
now, we can add support in the future for configuring this so that the
source/target don't have to be Polaris APIs. For example:

   - Export: the "source" could be the Polaris APIs and the "target" could
   be a file that we are just writing the entities and their properties to
   - Import: the "source" could be a file from a previous export and the
   "target" could be the Polaris APIs
   - OSS to managed: the "source" could be a metastore level connection and
   the "target" a Polaris API
   - Managed to OSS: the "source" could be the Polaris APIs and the
   "target" a metastore connection
   - or really any combo of the above, we just need a common interface for
   them all to implement

I'm working on a draft right now that can generify the interfaces enough
semantically to where I think we can write implementations down the line to
support the functionality above, but before I dive too deep into the
implementation for that- would that plan sound okay for everyone?

Based on some customers we've discussed with, the tool even in its current
state seems to provide value, so how do people feel about moving forward
with that initial structure knowing it can be easily extensible to add more
thorough features and use cases?

The customers I'm talking to are wanting something that's easier than
having to fully write their own migration utility to be able to do more
ad-hoc backups / migrations and this initial version has been sufficient to
give that peace of mind that there are some utilities to ease a migration
to / from a managed or OSS self hosted instance.

I agree that there's definitely a lot of improvements that can be made on
top of this to make an even smoother experience and widen the use cases the
migrator can be used for though.

Thanks all for the feedback!
Sehaj

On Mon, Apr 14, 2025 at 2:57 PM Dmitri Bourlatchkov <di...@apache.org>
wrote:

> Export / Import does not have to be at the Persistence layer. If the Admin
> API is expressive enough, it should be possible use it instead, I guess.
>
> Cheers,
> Dmitri.
>
> On Fri, Apr 11, 2025 at 9:17 PM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > For keeping two Polaris instances in sync, I agree that replicating at
> the
> > persistence layer probably makes the most sense.
> >
> > However there are cases when you want to copy data from one Polaris
> > instance to another but you may not have direct access to the metastore.
> > For example, migrating from a self-hosted Polaris instance to a managed
> > offering. To support these cases, I think a tool like this can be useful.
> >
> > On Fri, Apr 11, 2025 at 6:12 PM Ajantha Bhat <ajanthab...@gmail.com>
> > wrote:
> >
> > > Hey, Thanks for the proposal and I agree with Yufei.
> > >
> > > We had a backend synchronization CLI for projectNessie[1]. Maybe we can
> > > have something similar to that instead of taking a long path of the
> > > register table for migration between polaris instances.
> > >
> > > [1] https://projectnessie.org/nessie-0-82-0/export_import/
> > >
> > > - Ajantha
> > >
> > > On Sat, Apr 12, 2025 at 5:54 AM Yufei Gu <flyrain...@gmail.com> wrote:
> > >
> > > > Thanks, Mansehaj, for the proposal! This tool has potential, but I
> > think
> > > we
> > > > should clarify its capabilities a bit more explicitly. Given its
> > current
> > > > limitations, I'm not sure how broadly useful it would be. Have we
> > > explored
> > > > any alternative approaches—for example, performing synchronization on
> > the
> > > > backend(FoundationDB, Postgres)?
> > > >
> > > >
> > > > Yufei
> > > >
> > > > On Thu, Apr 10, 2025 at 4:22 PM Mansehaj Singh
> > > > <mansehaj.si...@snowflake.com.invalid> wrote:
> > > >
> > > > > Hi all! Nice to meet you.
> > > > >
> > > > > I opened up https://github.com/apache/polaris-tools/pull/4
> recently
> > to
> > > > add
> > > > > a Polaris migration/synchronizer tool I've been working on to the
> > > > > polaris-tools repo. By request, I'm sharing a design document here
> > > > > detailing how the tool works and the roadmap for functionality that
> > is
> > > in
> > > > > development.
> > > > >
> > > > > Here's the design doc giving a full overview:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1AXKmzp3JaTuUS_FMNnxr_pHsBTs86rWRMborMi3deCw/edit?usp=sharing
> > > > >
> > > > >
> > > > > To summarize:
> > > > >
> > > > > We can think of this tool as a configurable mirroring/migration
> tool
> > to
> > > > > migrate between two Polaris instances. I believe this would enable
> > and
> > > > > support many use cases that are quite cumbersome to carry out
> > manually
> > > > > today and break down barriers switching between open source and
> > managed
> > > > > offerings of Polaris. The tool has been designed with goals in mind
> > > that
> > > > go
> > > > > beyond supporting just the CLI implementation.
> > > > >
> > > > > Please take a look at the design doc if you're interested!
> > > > >
> > > > > Thank you!
> > > > > - Sehaj
> > > > >
> > > >
> > >
> >
>

Reply via email to