[GitHub] [iceberg] xloya commented on pull request #3977: [Core][Spark][Flink] Change partitioned fanout/delta writers map to caffine cache

GitBox Mon, 07 Feb 2022 18:01:57 -0800


xloya commented on pull request #3977:
URL: https://github.com/apache/iceberg/pull/3977#issuecomment-1032133971



   
   > @aokolnychyi, do you think this is a good idea?
   > 
   > I'm not sure about this. What will end up happening in Spark is that 
you'll create a lot of new data files. But in that case you should have used a 
better plan that clustered data instead of using the fanout writer. Maybe this 
is needed in Flink only?
   
   In fact this happens when user writes to all partitions using Spark `insert 
overwrite`. We have tried several ways, unless we use `distribute by` in SQL to 
break up the data, or the oom will still appear


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] xloya commented on pull request #3977: [Core][Spark][Flink] Change partitioned fanout/delta writers map to caffine cache

Reply via email to