xloya commented on pull request #3977: URL: https://github.com/apache/iceberg/pull/3977#issuecomment-1032135907
> My concern with this approach is that it can potentially create more small files (with close and open for the same partition after cache eviction). @xloya can you share some of the results that you tried with this approach? I am sure it can help with memory usage. but does it create more small files? > > I shared a design doc on shuffling support in Flink sink with the community a few months ago. That was a diff approach. https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIjjtxLWqo/edit#heading=h.o4q8a61sahkq Yes, this will lead to an increase in the number of small files, but with reasonable configuration, I think it can be in a relatively balanced state. For the way you mentioned, I think it is feasible. We will perform `Keyby` operation on Flink writing to solve this problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
