[GitHub] [iceberg] kbendick commented on issue #3885: [OOM] MERGE INTO table with Spark Structured Streaming

GitBox Tue, 18 Jan 2022 00:22:26 -0800


kbendick commented on issue #3885:
URL: https://github.com/apache/iceberg/issues/3885#issuecomment-1015171549



   `unpersist` is by default a non-blocking operation. You might consider 
passing `blocking=true` (I believe that's the argument) to ensure that the 
dataframe is truly unpersisted when you make that call. This can add time of 
course, as it's blocking, but will lower the likelihood of OOMs if that's where 
you think they are coming from.
   
   A small adjustment to see how it's working. But I would go with Russell's 
answer.
   
   Also, if you update to Spark 3.1, you can use multiple MACHED statements in 
the same query, which could also majorly reduce your runtime (or at least code 
complexity).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on issue #3885: [OOM] MERGE INTO table with Spark Structured Streaming

Reply via email to