Hi folks, Laurent and I worked on a new proposal for Apache Polaris: Polaris Table Source.
The purpose is to have a mechanism to create Iceberg tables in Polaris corresponding to non Iceberg data, allowing Polaris to be the "unique" catalog enforcing governance and gathering data sources in one catalog. An user can register a source configuration in Polaris (Polaris will have a Source Configuration registry). Then source services (not running in Polaris, they are "external" services) are using the registry to create the corresponding table in Polaris. We distinguish three kinds of sources: * structured data on a location (Parquet files, JSON files, CSV files, XML files, ...): a source service will create the Iceberg tables "wrapping" this data, the created table uses the schema from the "original" file. * unstructured data on a location (image files, video files, PDF files): a source service will "wrap" the location and metadata on the files in a table with "fixed" schema (file location, etags, last modification data, creation data, etc) * table format: here it would be possible to "import" a table in Polaris using an existing table format. For instance, in the case of existing Iceberg tables, we can use the metadata.json as an "import" basis. We can also support other table formats (Delta directly in Polaris, in addition to using a specific Spark client as we do today, we can also support Paimon, see this discussion https://github.com/apache/polaris/discussions/2453). The detailed proposal document is here: * https://docs.google.com/document/d/1OBDkPbWdf0Bq6Wa_BMKaXn-fqAxfdmepo57ggkkC8mI/edit?usp=sharing Any feedback and comments are welcome ! Thanks ! Regards JB