zhangjun0x01 commented on pull request #1624: URL: https://github.com/apache/iceberg/pull/1624#issuecomment-714201951
hi,@simonsssu: No matter how much this value is set, there will always be files smaller than the threshold that are repeatedly compressed. My idea is to find a compromise in various cases, such as do Rewrite Action on continuously written streaming data. First One Rewrite Action generated a 99M file, but when the second Rewrite Action was performed, the streaming task generated a lot of small files again. For example, if a file is 10M, this 99M file and 10M files are merged into 109M, to avoid this 99M file be compressed repeatedly. If we only temporarily rewrite the datafile once, the 99M file is regenerated, I think it is acceptable. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org