Hi folks,

We got couple requests from community from Slack for providing an automation 
tool to bootstrap a list of entities in Polaris (e.g. principals).

Currently the python CLI only support one entity modification at a time and a 
custom wrapper script is needed to perform operations in bulk. Many people 
ended up implementing a custom wrapper in different ways to achieve this.

To have a better way to support the community for this feature request, I ended 
up writing a POC implementation with existed python CLI (PR in 
https://github.com/apache/polaris/pull/1474).

The POC is pretty basic and it is really just to show maybe we can do this way 
to support quick environment bootstrap. Also, this can be easily integrated 
with CI for rolling out new entities (PR for the change and have CI call the 
CLI with the changes in the input config to rollout new changes). So far, it is 
supported with the current PR. 

To make it more useful, we may want to support the following:
1. declarative approach: define what you needed in the configure file and we 
figure out the gap and roll out the needed changes to fulfill the gap. This 
will also mean update will be supported. We may want to think about support of 
delete as this can be destructive and we don't control all entities such as 
tables/views.
2. setup export: export what is currently in the catalog (all polaris entities 
as well as tables/views? or add a flag to support what should be exported? 
think about this as mysqldump where users can decide what to dump and use the 
dump to restore or create new environments fully or partially)

While chatting with Eric, I got to know sync tool 
(https://github.com/apache/polaris-tools/tree/main/polaris-synchronizer) has 
the similar roadmap for supporting this feature (ML in 
https://lists.apache.org/thread/5p96vdvj5x68kfhk8f8vxo51v0y5x769). I would like 
to get some input from community before adding more codes to the existed PR as 
well as the potential features mentioned above.

Based on my understanding, the existed sync tool will sync Polaris entities 
between two catalog servers as well as tables/views associated with the 
catalog. However, this doesn't currently support environment bootstrap nor 
setup export. I don't have a strong preference for where the functionality 
should be added/implemented, but I do think having the ability to quickly 
import/export environments is handy and practical as we will have a set of 
environments and running commands line by line is not really feasible.

Thanks,
Yong Zheng

Reply via email to