hameizi commented on pull request #2867: URL: https://github.com/apache/iceberg/pull/2867#issuecomment-886502378
> > This add one feature that flink write iceberg auto compact small files. > > Possibly I'm missing something, but I don't see any accounting for files that might be already near, or close to the optimal size. It's late and my eyes may deceive me, but this appears to be compacting all files to be ideally the target file size bytes, regardless of their existing size etc. In some cases, the cost of opening and rewriting provides less value than leaving the data as is. Can we account for this like we do in some other places? Or am I just missing the fact that that functionality is hidden elsewhere? > > This would be a good topic to consider discussing in the mentioned GitHub issue :) It will compact files who is the result by partition filter, so it will compact all files when there is no partitions. But in my work sence, there is no problem because we compact every transction that will generate little files. So compact files in every transcation is quickly and small cost and the time compact files will not more than transaction time too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
