[GitHub] [iceberg] stevenzwu edited a comment on pull request #3181: Parquet: make row group check (min and max record count) configurable.

GitBox Thu, 30 Sep 2021 19:14:19 -0700


stevenzwu edited a comment on pull request #3181:
URL: https://github.com/apache/iceberg/pull/3181#issuecomment-930764904



   @jackye1995 Please take another look.
   
   Regarding your comment that new configs is more specific to the use case on 
engine side, I think this is not engine specific. Sure, it probably matters a 
little more on the streaming ingestion (Flink or Spark streaming). It can 
matter to batch write too. 
   
   E.g., We want to have smaller row group size (like 16 MB) to be able to 
split files into more splits for higher parallelism.  if the average row size 
is big (like MBs), then we need to tune down these configs to have more 
accurate control on the target row group size. This is useful if we want more 
accurate control on the row group size (and memory consumption) irrespective to 
streaming or batch write.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] stevenzwu edited a comment on pull request #3181: Parquet: make row group check (min and max record count) configurable.

Reply via email to