pvary commented on issue #13438:
URL: https://github.com/apache/iceberg/issues/13438#issuecomment-3088940093

   Having a detailed document at this point would be premature.
   
   The steps which are needed:
   1. Create an API layer for readers and writers - #12298 
   2. Create a test framework to verify that the readers and writers are 
working by the specification
       - After this point if a file format (like Lance) decides that it wants 
to support Iceberg tables, it can create a test implementation which could be 
used for functional and performance testing
   3. The community needs to decide if/how it wants to support new file 
formats. I see the following (not exclusive) possibilities myself:
       a. Add the new certified format to the FileFormat enum
       b. Change the File Format enum to a String, and enable "any" file formats
       c. Add an intermediate layer to allow testing of file formats which 
implement this intermediate layer
   
   The benefit of integrating a new file format could be:
   1. Performance benefits, if the reading/writing of the format is more 
effective than Parquet/ORC/Avro
   2. Compatibility benefit, as more engine could read the data. This is more 
pronounced with 3.c, since there is no need implement the readers and writers 
for every FileFormat+Engine  combination
   3. Catalog integration benefit, as tables could be organized by a single 
catalog.
   
   About @jackye1995's document:
   I like the integration solution proposed by him. There is one point, where 
we need to enhance the current File Format API proposal for this. Currently the 
readers and a writers are working on a single data stream. To enable the 
integration mentioned in the proposal, we need to push the FileIO to the 
readers and writers so they can open additional data/index files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to