[GitHub] [iceberg] ldwnt commented on issue #6956: Spark: Data file rewriting spark job fails with oom

via GitHub Tue, 28 Feb 2023 21:47:27 -0800


ldwnt commented on issue #6956:
URL: https://github.com/apache/iceberg/issues/6956#issuecomment-1449386215


   > @ldwnt, if the upstream table is completely refreshed every day, then why 
use a stream to move the data over to analytic storage? Seems like using a 
one-time copy after the refresh makes more sense.
   > 
   > I also think that, in general, directly updating an analytic table from 
Flink is a bad idea. It's usually much more efficient to write the changes 
directly into a table and periodically compact to materialize the latest table 
state.
   
   It's possible to handle the completely refreshed tables in the way you 
metioned. The reason it's not is that I'm ingesting tables from 20 mysql dbs to 
iceberg and want to archieve the goal using the same set of flink applications.
   
   I change the spark executor memory from 3g to 4g and the rewriting finishes 
without oom. It seems the cause of the oom is the many delete records collected 
in memory. Also, the 1.4g used memory of executor displayed in spark UI is not 
accurate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ldwnt commented on issue #6956: Spark: Data file rewriting spark job fails with oom

Reply via email to