sririshindra commented on code in PR #7096:
URL: https://github.com/apache/iceberg/pull/7096#discussion_r1141210980
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -123,10 +124,10 @@ public class DeleteOrphanFilesSparkAction extends
BaseSparkAction<DeleteOrphanFi
private ExecutorService deleteExecutorService = null;
DeleteOrphanFilesSparkAction(SparkSession spark, Table table) {
- super(spark);
-
- this.hadoopConf = new
SerializableConfiguration(spark.sessionState().newHadoopConf());
- this.listingParallelism =
spark.sessionState().conf().parallelPartitionDiscoveryParallelism();
+ super(spark.cloneSession());
+ spark().conf().set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD().key(), -1);
Review Comment:
I am a bit confused. Why are we hard coding this config here? Even if a user
desires to use broadcastHashJoin for DeleteOrphanFilesSparkAction, any
configuration they set up while initializing their SparkSession to force a
broadcast join would be ignored, making it impossible to use..
For instance a user might want to set
`spark().conf().set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD().key(), 100MB)` so
that their code doesn't OOM out or maybe their jobs are small enough that they
don't see the oom issue at all. Here we are simply overriding whatever user
might have configured for their job and forcing them to use always use
SortMergeJoin.
If the user wants to disable broadcastJoin entirely for
DeleteOrphanFilesSparkAction, they can do so in their spark config themselves.
I think we shouldn't take away that choice from the user.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]