Hi folks,

Laurent and I worked on a new proposal for Apache Polaris: Polaris Table Source.

The purpose is to have a mechanism to create Iceberg tables in Polaris
corresponding to non Iceberg data, allowing Polaris to be the "unique"
catalog enforcing governance and gathering data sources in one
catalog.
An user can register a source configuration in Polaris (Polaris will
have a Source Configuration registry). Then source services (not
running in Polaris, they are "external" services) are using the
registry to create the corresponding table in Polaris.
We distinguish three kinds of sources:
* structured data on a location (Parquet files, JSON files, CSV files,
XML files, ...): a source service will create the Iceberg tables
"wrapping" this data, the created table uses the schema from the
"original" file.
* unstructured data on a location (image files, video files, PDF
files): a source service will "wrap" the location and metadata on the
files in a table with "fixed" schema (file location, etags, last
modification data, creation data, etc)
* table format: here it would be possible to "import" a table in
Polaris using an existing table format. For instance, in the case of
existing Iceberg tables, we can use the metadata.json as an "import"
basis. We can also support other table formats (Delta directly in
Polaris, in addition to using a specific Spark client as we do today,
we can also support Paimon, see this discussion
https://github.com/apache/polaris/discussions/2453).

The detailed proposal document is here:
* 
https://docs.google.com/document/d/1OBDkPbWdf0Bq6Wa_BMKaXn-fqAxfdmepo57ggkkC8mI/edit?usp=sharing

Any feedback and comments are welcome !

Thanks !

Regards
JB

Reply via email to