Hi all: 
i want to ask a question about broadcast nestloop join? from google i know, 
that 
 left outer/semi join and right outer/semi join will use broadcast nestloop.
  and in some cases, when the input data is very small, it is suitable to use. 
so here
  how to define the input data very small? what parameter decides the 
threshold?  I just want to disable it ( i found that   set 
spark.sql.autoBroadcastJoinThreshold= -1 is no work for sql:select a.key1  from 
testdata1 as a where a.key1 not in (select key3 from testdata3) )




```


explain cost select a.key1  from testdata1 as a where a.key1 not in (select 
key3 from testdata3);


== Physical Plan ==
*(1) Project [key1#90]
+- BroadcastNestedLoopJoin BuildRight, LeftAnti, ((key1#90 = key3#92) || 
isnull((key1#90 = key3#92)))
   :- HiveTableScan [key1#90], HiveTableRelation `default`.`testdata1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key1#90, value1#91]
   +- BroadcastExchange IdentityBroadcastMode
      +- HiveTableScan [key3#92], HiveTableRelation `default`.`testdata3`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key3#92, value3#93]


```


  my question is 
  1. why in not in subquery , BroadcastNestedLoopJoin is still used even i set 
spark.sql.autoBroadcastJoinThreshold= -1 
  2. which spark parameter  decides enable/disable BroadcastNestedLoopJoin.






Appreciate if you have suggestion




Best Regards


Kelly Zhang

Reply via email to