Hi Yong, thanks for the proposal! I think this is a great idea with clear
and practical use cases.

One important thing we should probably align on early is the input format
for the tool. That’s also where I see a good opportunity to share logic
with the synchronizer, even though the tools serve different purposes, they
could still use a common config structure and frontend.

The mysqldump-style export is an interesting idea, though it might be too
specific. Given Polaris’s relatively simple entity model, a JSON format
might offer a good balance between readability and flexibility. It's
git-friendly, human readable, and storage-agnostic(Postgres/FDB/MongoDB,
etc).
Here are a few existing tools and options:

   1. Postgres JSON export example
   
<https://alphahydrae.com/2021/02/how-to-export-postgresql-data-to-a-json-file>
   2. FoundationDB backup to JSON
   <https://apple.github.io/foundationdb/backups.html>
   3. Or we can just use the Polaris REST APIs to pull JSON directly from
   HTTP responses.

I'd suggest we start by agreeing on the format first, which will make it
easier to build consistent tooling on top, whether it's a standalone
bootstrapping tool or integrated with the sync tool.

Yufei


On Mon, Apr 28, 2025 at 7:10 PM Yong Zheng <yzh...@apache.org> wrote:

> Hi folks,
>
> We got couple requests from community from Slack for providing an
> automation tool to bootstrap a list of entities in Polaris (e.g.
> principals).
>
> Currently the python CLI only support one entity modification at a time
> and a custom wrapper script is needed to perform operations in bulk. Many
> people ended up implementing a custom wrapper in different ways to achieve
> this.
>
> To have a better way to support the community for this feature request, I
> ended up writing a POC implementation with existed python CLI (PR in
> https://github.com/apache/polaris/pull/1474).
>
> The POC is pretty basic and it is really just to show maybe we can do this
> way to support quick environment bootstrap. Also, this can be easily
> integrated with CI for rolling out new entities (PR for the change and have
> CI call the CLI with the changes in the input config to rollout new
> changes). So far, it is supported with the current PR.
>
> To make it more useful, we may want to support the following:
> 1. declarative approach: define what you needed in the configure file and
> we figure out the gap and roll out the needed changes to fulfill the gap.
> This will also mean update will be supported. We may want to think about
> support of delete as this can be destructive and we don't control all
> entities such as tables/views.
> 2. setup export: export what is currently in the catalog (all polaris
> entities as well as tables/views? or add a flag to support what should be
> exported? think about this as mysqldump where users can decide what to dump
> and use the dump to restore or create new environments fully or partially)
>
> While chatting with Eric, I got to know sync tool (
> https://github.com/apache/polaris-tools/tree/main/polaris-synchronizer)
> has the similar roadmap for supporting this feature (ML in
> https://lists.apache.org/thread/5p96vdvj5x68kfhk8f8vxo51v0y5x769). I
> would like to get some input from community before adding more codes to the
> existed PR as well as the potential features mentioned above.
>
> Based on my understanding, the existed sync tool will sync Polaris
> entities between two catalog servers as well as tables/views associated
> with the catalog. However, this doesn't currently support environment
> bootstrap nor setup export. I don't have a strong preference for where the
> functionality should be added/implemented, but I do think having the
> ability to quickly import/export environments is handy and practical as we
> will have a set of environments and running commands line by line is not
> really feasible.
>
> Thanks,
> Yong Zheng
>

Reply via email to