sririshindra commented on PR #7096:
URL: https://github.com/apache/iceberg/pull/7096#issuecomment-1478318029
> > In that case couldn't you set something like `sql("SET
spark.sql.autoBroadcastJoinThreshold=-1")` before the
DeleteOrphanFilesSparkAction and change it back to default once it finishes.
>
> If I knew this, I would definitely set it up like this. But in fact not
all users know how to set it up, unless after OOM occurs, I think the cost may
be even greater.
I think this is always a problem with any spark job that involves join. If
the join estimations are wrong you get OOM. That doesn't mean that we
completely disable broadcast join for everybody. if your users are unable to
disable broadcast join via config, then maybe you can disable it for them on
your own fork so that it doesn't affect everybody else who is using Iceberg.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]