georgelza opened a new issue, #3344: URL: https://github.com/apache/polaris/issues/3344
### Is your feature request related to a problem? Please describe. ok... idea is. User defines a source endpoint, where he/she knows parquet files will be written to, all part of a table... From here a job gets created, that creates a metadata file, and associated manifest file and child manifest lists containing reference to the parquet files, that exist from that root provided S3 point, that is all associated with a table. To push the luck button, when the user define the S3 or FS or HDFS end point, the user provides a table name and a OTF format... be that iceberg, hudi, paimon. the catalog object then gets created, the user specifies a scan interval, one option, or 2nd option, if a S3 end point, maybe a Lambda function gets added that fires when a file is POST/PUT in the sub directory structure, that then says relook at the Metadata file, does the manifest files differ from a previous scan, if yes, look at the new manifest lists and the new data files/parquet files and add them to the catalog, ### Describe the solution you'd like ... the above allows for the basic table creation to be done via lets imagine a proper qualified PyFlink/PyIceberg job, but after that other sources might drop new Parquet files into the sub structure... The above will then fire and auto discover/add them to the defined tables, based on the OTF provided. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
