ShuffledHashJoin Possible Issue

gsvic Sun, 18 Oct 2015 12:56:16 -0700

I am doing some experiments with join algorithms in SparkSQL and I am facing
the following issue:


I have costructed two "dummy" json tables, t1.json and t2.json. Each of them
has two columns, ID and Value. The ID is an incremental integer(unique) and
the Value a random value. I am running an equi-join query on ID attribute.
In case of SortMerge and BroadcastHashJoin algorithms, the return result is
correct but in case of ShuffledHashJoin the count aggregate returns always
zero. The correct result is t2, as t2.ID is a subset of t1.ID.

The query is *t1.join(t2).where(t1("ID").equalTo(t2("ID")))*





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/ShuffledHashJoin-Possible-Issue-tp14672.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

ShuffledHashJoin Possible Issue

Reply via email to