georgelza opened a new issue, #3344:
URL: https://github.com/apache/polaris/issues/3344

   ### Is your feature request related to a problem? Please describe.
   
   ok... idea is.
   User defines a source endpoint, where he/she knows parquet files will be 
written to, all part of a table...
   From here a job gets created, that creates a metadata file, and associated 
manifest file and child manifest lists containing reference to the parquet 
files, that exist from that root provided S3 point, that is all associated with 
a table.
   
   To push the luck button, when the user define the S3 or FS or HDFS end 
point, the user provides a table name and a OTF format...
   be that iceberg, hudi, paimon.
   the catalog object then gets created,
   the user specifies a scan interval, one option, or 2nd option, if a S3 end 
point, maybe a Lambda function gets added that fires when a file is POST/PUT in 
the sub directory structure, that then says relook at the Metadata file, does 
the manifest files differ from a previous scan, if yes, look at the new 
manifest lists and the new data files/parquet files and add them to the 
catalog, 
   
   
   ### Describe the solution you'd like
   
   ... the above allows for the basic table creation to be done via lets 
imagine a proper qualified PyFlink/PyIceberg job, but after that other sources 
might drop new Parquet files into the sub structure...
   The above will then fire and auto discover/add them to the defined tables, 
based on the OTF provided.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to