xloya commented on pull request #3977: URL: https://github.com/apache/iceberg/pull/3977#issuecomment-1032133971
> @aokolnychyi, do you think this is a good idea? > > I'm not sure about this. What will end up happening in Spark is that you'll create a lot of new data files. But in that case you should have used a better plan that clustered data instead of using the fanout writer. Maybe this is needed in Flink only? In fact this happens when user writes to all partitions using Spark `insert overwrite`. We have tried several ways, unless we use `distribute by` in SQL to break up the data, or the oom will still appear -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
