Hi everyone,

I have merged in https://github.com/mansehajsingh/polaris-tools/pull/4 to
give us a better framework for adding file import/export to the tool in the
future.

This basically puts the Polaris read/write source behind a common interface
that we can implement with new variations for different schemes of
migration we want to perform later, including reading/writing from a file.
Right now the tool still only supports the API as a source and a target.
Hopefully this will be a good starting point so that we can reuse the core
logic of the tool in the future as well. Reviews of the main PR:
https://github.com/apache/polaris-tools/pull/4 are welcome!

Thanks everyone,
Sehaj

On Mon, Apr 14, 2025 at 3:45 PM Mansehaj Singh <mansehaj.si...@snowflake.com>
wrote:

> Hey all!
>
> Totally agree going forward that metastore level access would be much more
> useful and correct for replicating than the Polaris APIs, and that some
> sort of import/export/polaris_dump functionality would open up a greater
> variety of use cases in the future. I don't think these have to be mutually
> exclusive with the tool in its current state, however. With the tool
> designed the way it is right now, we can essentially think of it as taking
> Polaris entities from a source, and creating those entities in a target
> destination. While it only supports Polaris API to API migrations right
> now, we can add support in the future for configuring this so that the
> source/target don't have to be Polaris APIs. For example:
>
>    - Export: the "source" could be the Polaris APIs and the "target"
>    could be a file that we are just writing the entities and their properties
>    to
>    - Import: the "source" could be a file from a previous export and the
>    "target" could be the Polaris APIs
>    - OSS to managed: the "source" could be a metastore level connection
>    and the "target" a Polaris API
>    - Managed to OSS: the "source" could be the Polaris APIs and the
>    "target" a metastore connection
>    - or really any combo of the above, we just need a common interface
>    for them all to implement
>
> I'm working on a draft right now that can generify the interfaces enough
> semantically to where I think we can write implementations down the line to
> support the functionality above, but before I dive too deep into the
> implementation for that- would that plan sound okay for everyone?
>
> Based on some customers we've discussed with, the tool even in its current
> state seems to provide value, so how do people feel about moving forward
> with that initial structure knowing it can be easily extensible to add more
> thorough features and use cases?
>
> The customers I'm talking to are wanting something that's easier than
> having to fully write their own migration utility to be able to do more
> ad-hoc backups / migrations and this initial version has been sufficient to
> give that peace of mind that there are some utilities to ease a migration
> to / from a managed or OSS self hosted instance.
>
> I agree that there's definitely a lot of improvements that can be made on
> top of this to make an even smoother experience and widen the use cases the
> migrator can be used for though.
>
> Thanks all for the feedback!
> Sehaj
>
> On Mon, Apr 14, 2025 at 2:57 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
>> Export / Import does not have to be at the Persistence layer. If the Admin
>> API is expressive enough, it should be possible use it instead, I guess.
>>
>> Cheers,
>> Dmitri.
>>
>> On Fri, Apr 11, 2025 at 9:17 PM Eric Maynard <eric.w.mayn...@gmail.com>
>> wrote:
>>
>> > For keeping two Polaris instances in sync, I agree that replicating at
>> the
>> > persistence layer probably makes the most sense.
>> >
>> > However there are cases when you want to copy data from one Polaris
>> > instance to another but you may not have direct access to the metastore.
>> > For example, migrating from a self-hosted Polaris instance to a managed
>> > offering. To support these cases, I think a tool like this can be
>> useful.
>> >
>> > On Fri, Apr 11, 2025 at 6:12 PM Ajantha Bhat <ajanthab...@gmail.com>
>> > wrote:
>> >
>> > > Hey, Thanks for the proposal and I agree with Yufei.
>> > >
>> > > We had a backend synchronization CLI for projectNessie[1]. Maybe we
>> can
>> > > have something similar to that instead of taking a long path of the
>> > > register table for migration between polaris instances.
>> > >
>> > > [1] https://projectnessie.org/nessie-0-82-0/export_import/
>> > >
>> > > - Ajantha
>> > >
>> > > On Sat, Apr 12, 2025 at 5:54 AM Yufei Gu <flyrain...@gmail.com>
>> wrote:
>> > >
>> > > > Thanks, Mansehaj, for the proposal! This tool has potential, but I
>> > think
>> > > we
>> > > > should clarify its capabilities a bit more explicitly. Given its
>> > current
>> > > > limitations, I'm not sure how broadly useful it would be. Have we
>> > > explored
>> > > > any alternative approaches—for example, performing synchronization
>> on
>> > the
>> > > > backend(FoundationDB, Postgres)?
>> > > >
>> > > >
>> > > > Yufei
>> > > >
>> > > > On Thu, Apr 10, 2025 at 4:22 PM Mansehaj Singh
>> > > > <mansehaj.si...@snowflake.com.invalid> wrote:
>> > > >
>> > > > > Hi all! Nice to meet you.
>> > > > >
>> > > > > I opened up https://github.com/apache/polaris-tools/pull/4
>> recently
>> > to
>> > > > add
>> > > > > a Polaris migration/synchronizer tool I've been working on to the
>> > > > > polaris-tools repo. By request, I'm sharing a design document here
>> > > > > detailing how the tool works and the roadmap for functionality
>> that
>> > is
>> > > in
>> > > > > development.
>> > > > >
>> > > > > Here's the design doc giving a full overview:
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1AXKmzp3JaTuUS_FMNnxr_pHsBTs86rWRMborMi3deCw/edit?usp=sharing
>> > > > >
>> > > > >
>> > > > > To summarize:
>> > > > >
>> > > > > We can think of this tool as a configurable mirroring/migration
>> tool
>> > to
>> > > > > migrate between two Polaris instances. I believe this would enable
>> > and
>> > > > > support many use cases that are quite cumbersome to carry out
>> > manually
>> > > > > today and break down barriers switching between open source and
>> > managed
>> > > > > offerings of Polaris. The tool has been designed with goals in
>> mind
>> > > that
>> > > > go
>> > > > > beyond supporting just the CLI implementation.
>> > > > >
>> > > > > Please take a look at the design doc if you're interested!
>> > > > >
>> > > > > Thank you!
>> > > > > - Sehaj
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to