Hi Hao, Each table is created with the following python code snippet:
data = [{'id': 'A%d'%i, 'value':ceil(random()*10)} for i in range(0,50)] with open('A.json', 'w+') as output: json.dump(data, output) The tables A and B containing 10 and 50 tuples respectively. In spark shell I type sqlContext.setConf("spark.sql.planner.sortMergeJoin", "false") to disable sortMergeJoin and sqlContext.setConf("spark.sql.autoBroadcastJoinThreshold", "0") to disable BroadcastHashJoin, cause the tables are too small and this join will be selected. Finally I run the following query: t1.join(t2).where(t1("id").equalTo(t2("id"))).count and the result I get equals to zero, while ShuffledHashJoin and SortMergeJoin returns the right result (10). -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/ShuffledHashJoin-Possible-Issue-tp14672p14682.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org