Hi everyone, I have merged in https://github.com/mansehajsingh/polaris-tools/pull/4 to give us a better framework for adding file import/export to the tool in the future.
This basically puts the Polaris read/write source behind a common interface that we can implement with new variations for different schemes of migration we want to perform later, including reading/writing from a file. Right now the tool still only supports the API as a source and a target. Hopefully this will be a good starting point so that we can reuse the core logic of the tool in the future as well. Reviews of the main PR: https://github.com/apache/polaris-tools/pull/4 are welcome! Thanks everyone, Sehaj On Mon, Apr 14, 2025 at 3:45 PM Mansehaj Singh <mansehaj.si...@snowflake.com> wrote: > Hey all! > > Totally agree going forward that metastore level access would be much more > useful and correct for replicating than the Polaris APIs, and that some > sort of import/export/polaris_dump functionality would open up a greater > variety of use cases in the future. I don't think these have to be mutually > exclusive with the tool in its current state, however. With the tool > designed the way it is right now, we can essentially think of it as taking > Polaris entities from a source, and creating those entities in a target > destination. While it only supports Polaris API to API migrations right > now, we can add support in the future for configuring this so that the > source/target don't have to be Polaris APIs. For example: > > - Export: the "source" could be the Polaris APIs and the "target" > could be a file that we are just writing the entities and their properties > to > - Import: the "source" could be a file from a previous export and the > "target" could be the Polaris APIs > - OSS to managed: the "source" could be a metastore level connection > and the "target" a Polaris API > - Managed to OSS: the "source" could be the Polaris APIs and the > "target" a metastore connection > - or really any combo of the above, we just need a common interface > for them all to implement > > I'm working on a draft right now that can generify the interfaces enough > semantically to where I think we can write implementations down the line to > support the functionality above, but before I dive too deep into the > implementation for that- would that plan sound okay for everyone? > > Based on some customers we've discussed with, the tool even in its current > state seems to provide value, so how do people feel about moving forward > with that initial structure knowing it can be easily extensible to add more > thorough features and use cases? > > The customers I'm talking to are wanting something that's easier than > having to fully write their own migration utility to be able to do more > ad-hoc backups / migrations and this initial version has been sufficient to > give that peace of mind that there are some utilities to ease a migration > to / from a managed or OSS self hosted instance. > > I agree that there's definitely a lot of improvements that can be made on > top of this to make an even smoother experience and widen the use cases the > migrator can be used for though. > > Thanks all for the feedback! > Sehaj > > On Mon, Apr 14, 2025 at 2:57 PM Dmitri Bourlatchkov <di...@apache.org> > wrote: > >> Export / Import does not have to be at the Persistence layer. If the Admin >> API is expressive enough, it should be possible use it instead, I guess. >> >> Cheers, >> Dmitri. >> >> On Fri, Apr 11, 2025 at 9:17 PM Eric Maynard <eric.w.mayn...@gmail.com> >> wrote: >> >> > For keeping two Polaris instances in sync, I agree that replicating at >> the >> > persistence layer probably makes the most sense. >> > >> > However there are cases when you want to copy data from one Polaris >> > instance to another but you may not have direct access to the metastore. >> > For example, migrating from a self-hosted Polaris instance to a managed >> > offering. To support these cases, I think a tool like this can be >> useful. >> > >> > On Fri, Apr 11, 2025 at 6:12 PM Ajantha Bhat <ajanthab...@gmail.com> >> > wrote: >> > >> > > Hey, Thanks for the proposal and I agree with Yufei. >> > > >> > > We had a backend synchronization CLI for projectNessie[1]. Maybe we >> can >> > > have something similar to that instead of taking a long path of the >> > > register table for migration between polaris instances. >> > > >> > > [1] https://projectnessie.org/nessie-0-82-0/export_import/ >> > > >> > > - Ajantha >> > > >> > > On Sat, Apr 12, 2025 at 5:54 AM Yufei Gu <flyrain...@gmail.com> >> wrote: >> > > >> > > > Thanks, Mansehaj, for the proposal! This tool has potential, but I >> > think >> > > we >> > > > should clarify its capabilities a bit more explicitly. Given its >> > current >> > > > limitations, I'm not sure how broadly useful it would be. Have we >> > > explored >> > > > any alternative approaches—for example, performing synchronization >> on >> > the >> > > > backend(FoundationDB, Postgres)? >> > > > >> > > > >> > > > Yufei >> > > > >> > > > On Thu, Apr 10, 2025 at 4:22 PM Mansehaj Singh >> > > > <mansehaj.si...@snowflake.com.invalid> wrote: >> > > > >> > > > > Hi all! Nice to meet you. >> > > > > >> > > > > I opened up https://github.com/apache/polaris-tools/pull/4 >> recently >> > to >> > > > add >> > > > > a Polaris migration/synchronizer tool I've been working on to the >> > > > > polaris-tools repo. By request, I'm sharing a design document here >> > > > > detailing how the tool works and the roadmap for functionality >> that >> > is >> > > in >> > > > > development. >> > > > > >> > > > > Here's the design doc giving a full overview: >> > > > > >> > > > > >> > > > >> > > >> > >> https://docs.google.com/document/d/1AXKmzp3JaTuUS_FMNnxr_pHsBTs86rWRMborMi3deCw/edit?usp=sharing >> > > > > >> > > > > >> > > > > To summarize: >> > > > > >> > > > > We can think of this tool as a configurable mirroring/migration >> tool >> > to >> > > > > migrate between two Polaris instances. I believe this would enable >> > and >> > > > > support many use cases that are quite cumbersome to carry out >> > manually >> > > > > today and break down barriers switching between open source and >> > managed >> > > > > offerings of Polaris. The tool has been designed with goals in >> mind >> > > that >> > > > go >> > > > > beyond supporting just the CLI implementation. >> > > > > >> > > > > Please take a look at the design doc if you're interested! >> > > > > >> > > > > Thank you! >> > > > > - Sehaj >> > > > > >> > > > >> > > >> > >> >