zhangjun0x01 commented on pull request #1624:
URL: https://github.com/apache/iceberg/pull/1624#issuecomment-714201951


   hi,@simonsssu:
   No matter how much this value is set, there will always be files smaller 
than the threshold that are repeatedly compressed. My idea is to find a 
compromise in various cases, such as do Rewrite Action on continuously written 
streaming data. First One Rewrite Action generated a 99M file, but when the 
second Rewrite Action was performed, the streaming task generated a lot of 
small files again. For example, if a file is 10M,  this 99M file and 10M files 
are merged into 109M,  to avoid this 99M file be compressed repeatedly.
   
   If we only temporarily rewrite the datafile once, the 99M file is 
regenerated, I think it is acceptable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to