xloya commented on PR #7096:
URL: https://github.com/apache/iceberg/pull/7096#issuecomment-1478802983

   > > > In that case couldn't you set something like `sql("SET 
spark.sql.autoBroadcastJoinThreshold=-1")` before the 
DeleteOrphanFilesSparkAction and change it back to default once it finishes.
   > > 
   > > 
   > > If I knew this, I would definitely set it up like this. But in fact not 
all users know how to set it up, unless after OOM occurs, I think the cost may 
be even greater.
   > 
   > I think this is always a problem with any spark job that involves join. If 
the join estimations are wrong you get OOM. That doesn't mean that we 
completely disable broadcast join for everybody. if your users are unable to 
disable broadcast join via config, then maybe you can disable it for them on 
your own fork so that it doesn't affect everybody else who is using Iceberg.
   
   Yes, we have disabled it internally at the moment, but I personally think 
it's still not the best way. This is why I mentioned this Patch to the 
community, to see if anyone can optimize the action and fundamentally solve the 
problem of inaccurate estimates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to