nssalian commented on PR #14297:
URL: https://github.com/apache/iceberg/pull/14297#issuecomment-4112833959

   Thanks for the feedback @pvary. @aihuaxu , @pvary and I synced offline to 
discuss how to move this forward. Adding a note here so that it's easy to 
review. I've made the  following changes:                                       
                                                                                
                                             
                                                                                
                                                  
   1. Refactored per @pvary's suggestion to buffer above the writer. Added 
`BufferedFileAppender` in iceberg-core that buffers the first N rows, infers 
the shredded schema, then creates the real writer.                              
                      
   2. Moved `VariantShreddingAnalyzer` from Spark to the parquet module as an 
abstract class for Spark/Flink reuse.               
   @Guosmilesmile you can eventually reuse this in your PR.                     
                                                  
   3. Added `Parquet.WriteBuilder.withFileSchema(MessageType)` to supply a 
pre-computed Parquet schema at write time.             
   4. Removed `WriterLazyInitializable, 
SparkParquetWriterWithVariantShredding`, and the `4-arg WriterFunction` since 
that wasn't the pattern preferred.                                              
                                                          
   5. Additional tests and added an extra check for precision in decimals.
   
   @huaxingao, @pvary, @aihuaxu, please review when you have a chance.          


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to