[
https://issues.apache.org/jira/browse/SPARK-31148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tanapol Nearunchorn updated SPARK-31148:
----------------------------------------
Description:
Given this query
{code:java}
select r.id
from sqoop_wongnai.wongnai__w_ref r
inner join sqoop_wongnai.wongnai__w_photo p on p.root_referrer_id = r.id
inner join tests.restaurants_id_export_photos i on i.restaurant_id =
r.restaurant_id
where r.restaurant_id = 360
{code}
Spark return nothing except when I switch inner join statement between line 3
and 4 there are results returned.
I guarantee that we have data that match the join condition above.
Here's result from explain extend of above query:
{code:java}
== Parsed Logical Plan ==
'Project ['r.id]
+- 'Filter ('r.restaurant_id = 360)
+- 'Join Inner, ('i.restaurant_id = 'r.restaurant_id)
:- 'Join Inner, ('p.root_referrer_id = 'r.id)
: :- 'SubqueryAlias `r`
: : +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_ref`
: +- 'SubqueryAlias `p`
: +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_photo`
+- 'SubqueryAlias `i`
+- 'UnresolvedRelation `tests`.`restaurants_id_export_photos`
== Analyzed Logical Plan ==
id: bigint
Project [id#1834834L]
+- Filter (restaurant_id#1834836L = cast(360 as bigint))
+- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
:- Join Inner, (root_referrer_id#1834858L = id#1834834L)
: :- SubqueryAlias `r`
: : +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_ref`
: : +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#1834834L,
restaurant_id#1834836L,]
: +- SubqueryAlias `p`
: +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_photo`
: +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_photo`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#1834858L, ...
14 more fields]
+- SubqueryAlias `i`
+- SubqueryAlias `tests`.`restaurants_id_export_photos`
+- HiveTableRelation `tests`.`restaurants_id_export_photos`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]==
Optimized Logical Plan ==
Project [id#1834834L]
+- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
:- Project [id#1834834L, restaurant_id#1834836L]
: +- Join Inner, (root_referrer_id#1834858L = id#1834834L)
: :- Project [id#1834834L, restaurant_id#1834836L]
: : +- Filter ((isnotnull(restaurant_id#1834836L) &&
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
: : +- InMemoryRelation [id#1834834L, restaurant_id#1834836L],
StorageLevel(disk, memory, deserialized, 1 replicas)
: : +- Scan hive sqoop_wongnai.wongnai__w_ref [id#87357L,
restaurant_id#87359L], HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L, restaurant_id#87359L]
: +- Project [root_referrer_id#1834858L]
: +- Filter (isnotnull(root_referrer_id#1834858L) &&
bloomfilter#1835088 of [id#1834834L] filtering [root_referrer_id#1834858L])
: +- InMemoryRelation [root_referrer_id#1834858L],
StorageLevel(disk, memory, deserialized, 1 replicas)
: +- Scan hive sqoop_wongnai.wongnai__w_photo
[root_referrer_id#82906L], HiveTableRelation
`sqoop_wongnai`.`wongnai__w_photo`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
+- Filter ((isnotnull(restaurant_id#1834876L) && (restaurant_id#1834876L =
360)) && bloomfilter#1835087 of [restaurant_id#1834836L] filtering
[restaurant_id#1834876L])
+- HiveTableRelation `tests`.`restaurants_id_export_photos`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
== Physical Plan ==
*(7) Project [id#1834834L]
+- *(7) SortMergeJoin [restaurant_id#1834836L], [restaurant_id#1834876L], Inner
:- *(5) Sort [restaurant_id#1834836L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(restaurant_id#1834836L, 200)
: +- *(4) Project [id#1834834L, restaurant_id#1834836L]
: +- *(4) SortMergeJoin [id#1834834L], [root_referrer_id#1834858L],
Inner
: :- *(2) Sort [id#1834834L ASC NULLS FIRST], false, 0
: : +- Exchange hashpartitioning(id#1834834L, 200)
: : +- *(1) Filter ((isnotnull(restaurant_id#1834836L) &&
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
: : +- *(1) InMemoryTableScan [id#1834834L,
restaurant_id#1834836L], [isnotnull(restaurant_id#1834836L),
(restaurant_id#1834836L = 360), isnotnull(id#1834834L)]
: : +- InMemoryRelation [id#1834834L,
restaurant_id#1834836L], StorageLevel(disk, memory, deserialized, 1 replicas)
: : +- Scan hive sqoop_wongnai.wongnai__w_ref
[id#87357L, restaurant_id#87359L], HiveTableRelation
`sqoop_wongnai`.`wongnai__w_ref`, org.apache.hadoop.hive.serde2.avro.AvroSerDe,
[id#87357L, restaurant_id#87359L]
: +- *(3) Sort [root_referrer_id#1834858L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(root_referrer_id#1834858L, 200)
: +- Filter (isnotnull(root_referrer_id#1834858L) &&
bloomfilter#1835088 of [bf1835088 id#1834834L estimatedNumRows=1457534]
filtering [root_referrer_id#1834858L])
: : +- GenerateBloomFilter
: : +- ReusedExchange [id#1834834L,
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
: +- InMemoryTableScan [root_referrer_id#1834858L],
[isnotnull(root_referrer_id#1834858L), bloomfilter#1835088 of [bf1835088
id#1834834L estimatedNumRows=1457534] filtering [root_referrer_id#1834858L]]
: :- InMemoryRelation [root_referrer_id#1834858L],
StorageLevel(disk, memory, deserialized, 1 replicas)
: : +- Scan hive sqoop_wongnai.wongnai__w_photo
[root_referrer_id#82906L], HiveTableRelation
`sqoop_wongnai`.`wongnai__w_photo`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
: +- GenerateBloomFilter
: +- ReusedExchange [id#1834834L,
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
+- *(6) Sort [restaurant_id#1834876L ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(restaurant_id#1834876L, 200)
+- Filter ((isnotnull(restaurant_id#1834876L) &&
(restaurant_id#1834876L = 360)) && bloomfilter#1835087 of [bf1835088
id#1834834L estimatedNumRows=1457534] filtering [restaurant_id#1834876L])
: +- GenerateBloomFilter
: +- ReusedExchange [id#1834834L, restaurant_id#1834836L],
Exchange hashpartitioning(id#1834834L, 200)
+- Scan hive tests.restaurants_id_export_photos
[restaurant_id#1834876L], HiveTableRelation
`tests`.`restaurants_id_export_photos`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
{code}
I noticed that in physical plan
{code:java}
bloomfilter#1835087 of [bf1835088 id#1834834L estimatedNumRows=1457534]
filtering [restaurant_id#1834876L]
{code}
it use bf1835088 id#1834834L for filtering restaurant_id#1834876L and which is
different from join condition.
I think that the correct Physical query plan should use bf1835087
restaurant_id#1834836L for filtering restaurant_id#1834876L.
You can see full of query plan of both incorrect and correct in this link:
[https://gist.github.com/tanapoln/67a335858dede5eddd9d80659de60d72]
I don't know why switching inner join statement yield the different result.
I don't know it is a bug or I misunderstand join in Spark.
Please help.
was:
Given this query
{code:java}
select r.id
from sqoop_wongnai.wongnai__w_ref r
inner join sqoop_wongnai.wongnai__w_photo p on p.root_referrer_id = r.id
inner join tests.restaurants_id_export_photos i on i.restaurant_id =
r.restaurant_id
where r.restaurant_id = 360
{code}
Spark return nothing except when I switch inner join statement between line 3
and 4 there are results returned.
I guarantee that we have data that match the join condition above.
Here's result from explain extend of above query:
{code:java}
== Parsed Logical Plan ==
'Project ['r.id]
+- 'Filter ('r.restaurant_id = 360)
+- 'Join Inner, ('i.restaurant_id = 'r.restaurant_id)
:- 'Join Inner, ('p.root_referrer_id = 'r.id)
: :- 'SubqueryAlias `r`
: : +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_ref`
: +- 'SubqueryAlias `p`
: +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_photo`
+- 'SubqueryAlias `i`
+- 'UnresolvedRelation `tests`.`restaurants_id_export_photos`
== Analyzed Logical Plan ==
id: bigint
Project [id#1834834L]
+- Filter (restaurant_id#1834836L = cast(360 as bigint))
+- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
:- Join Inner, (root_referrer_id#1834858L = id#1834834L)
: :- SubqueryAlias `r`
: : +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_ref`
: : +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#1834834L,
restaurant_id#1834836L,]
: +- SubqueryAlias `p`
: +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_photo`
: +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_photo`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#1834858L, ...
14 more fields]
+- SubqueryAlias `i`
+- SubqueryAlias `tests`.`restaurants_id_export_photos`
+- HiveTableRelation `tests`.`restaurants_id_export_photos`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]==
Optimized Logical Plan ==
Project [id#1834834L]
+- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
:- Project [id#1834834L, restaurant_id#1834836L]
: +- Join Inner, (root_referrer_id#1834858L = id#1834834L)
: :- Project [id#1834834L, restaurant_id#1834836L]
: : +- Filter ((isnotnull(restaurant_id#1834836L) &&
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
: : +- InMemoryRelation [id#1834834L, restaurant_id#1834836L],
StorageLevel(disk, memory, deserialized, 1 replicas)
: : +- Scan hive sqoop_wongnai.wongnai__w_ref [id#87357L,
restaurant_id#87359L], HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L, restaurant_id#87359L]
: +- Project [root_referrer_id#1834858L]
: +- Filter (isnotnull(root_referrer_id#1834858L) &&
bloomfilter#1835088 of [id#1834834L] filtering [root_referrer_id#1834858L])
: +- InMemoryRelation [root_referrer_id#1834858L],
StorageLevel(disk, memory, deserialized, 1 replicas)
: +- Scan hive sqoop_wongnai.wongnai__w_photo
[root_referrer_id#82906L], HiveTableRelation
`sqoop_wongnai`.`wongnai__w_photo`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
+- Filter ((isnotnull(restaurant_id#1834876L) && (restaurant_id#1834876L =
360)) && bloomfilter#1835087 of [restaurant_id#1834836L] filtering
[restaurant_id#1834876L])
+- HiveTableRelation `tests`.`restaurants_id_export_photos`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
== Physical Plan ==
*(7) Project [id#1834834L]
+- *(7) SortMergeJoin [restaurant_id#1834836L], [restaurant_id#1834876L], Inner
:- *(5) Sort [restaurant_id#1834836L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(restaurant_id#1834836L, 200)
: +- *(4) Project [id#1834834L, restaurant_id#1834836L]
: +- *(4) SortMergeJoin [id#1834834L], [root_referrer_id#1834858L],
Inner
: :- *(2) Sort [id#1834834L ASC NULLS FIRST], false, 0
: : +- Exchange hashpartitioning(id#1834834L, 200)
: : +- *(1) Filter ((isnotnull(restaurant_id#1834836L) &&
(restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
: : +- *(1) InMemoryTableScan [id#1834834L,
restaurant_id#1834836L], [isnotnull(restaurant_id#1834836L),
(restaurant_id#1834836L = 360), isnotnull(id#1834834L)]
: : +- InMemoryRelation [id#1834834L,
restaurant_id#1834836L], StorageLevel(disk, memory, deserialized, 1 replicas)
: : +- Scan hive sqoop_wongnai.wongnai__w_ref
[id#87357L, restaurant_id#87359L], HiveTableRelation
`sqoop_wongnai`.`wongnai__w_ref`, org.apache.hadoop.hive.serde2.avro.AvroSerDe,
[id#87357L, restaurant_id#87359L]
: +- *(3) Sort [root_referrer_id#1834858L ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(root_referrer_id#1834858L, 200)
: +- Filter (isnotnull(root_referrer_id#1834858L) &&
bloomfilter#1835088 of [bf1835088 id#1834834L estimatedNumRows=1457534]
filtering [root_referrer_id#1834858L])
: : +- GenerateBloomFilter
: : +- ReusedExchange [id#1834834L,
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
: +- InMemoryTableScan [root_referrer_id#1834858L],
[isnotnull(root_referrer_id#1834858L), bloomfilter#1835088 of [bf1835088
id#1834834L estimatedNumRows=1457534] filtering [root_referrer_id#1834858L]]
: :- InMemoryRelation [root_referrer_id#1834858L],
StorageLevel(disk, memory, deserialized, 1 replicas)
: : +- Scan hive sqoop_wongnai.wongnai__w_photo
[root_referrer_id#82906L], HiveTableRelation
`sqoop_wongnai`.`wongnai__w_photo`,
org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
: +- GenerateBloomFilter
: +- ReusedExchange [id#1834834L,
restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
+- *(6) Sort [restaurant_id#1834876L ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(restaurant_id#1834876L, 200)
+- Filter ((isnotnull(restaurant_id#1834876L) &&
(restaurant_id#1834876L = 360)) && bloomfilter#1835087 of [bf1835088
id#1834834L estimatedNumRows=1457534] filtering [restaurant_id#1834876L])
: +- GenerateBloomFilter
: +- ReusedExchange [id#1834834L, restaurant_id#1834836L],
Exchange hashpartitioning(id#1834834L, 200)
+- Scan hive tests.restaurants_id_export_photos
[restaurant_id#1834876L], HiveTableRelation
`tests`.`restaurants_id_export_photos`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
{code}
I noticed that in physical plan
{code:java}
bloomfilter#1835087 of [bf1835088 id#1834834L estimatedNumRows=1457534]
filtering [restaurant_id#1834876L]
{code}
it use bf1835088 id#1834834L for filtering restaurant_id#1834876L and which is
different from join condition.
I think that the correct Physical query plan should use bf1835087
restaurant_id#1834836L for filtering restaurant_id#1834876L.
You can see full of query plan of both incorrect and correct in this link:
[https://gist.github.com/tanapoln/67a335858dede5eddd9d80659de60d72]
Please help.
> Physical Plan and Optimized Logical Plan are mismatch
> -----------------------------------------------------
>
> Key: SPARK-31148
> URL: https://issues.apache.org/jira/browse/SPARK-31148
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.4
> Reporter: Tanapol Nearunchorn
> Priority: Major
>
> Given this query
>
> {code:java}
> select r.id
> from sqoop_wongnai.wongnai__w_ref r
> inner join sqoop_wongnai.wongnai__w_photo p on p.root_referrer_id = r.id
> inner join tests.restaurants_id_export_photos i on i.restaurant_id =
> r.restaurant_id
> where r.restaurant_id = 360
> {code}
> Spark return nothing except when I switch inner join statement between line 3
> and 4 there are results returned.
> I guarantee that we have data that match the join condition above.
> Here's result from explain extend of above query:
>
> {code:java}
> == Parsed Logical Plan ==
> 'Project ['r.id]
> +- 'Filter ('r.restaurant_id = 360)
> +- 'Join Inner, ('i.restaurant_id = 'r.restaurant_id)
> :- 'Join Inner, ('p.root_referrer_id = 'r.id)
> : :- 'SubqueryAlias `r`
> : : +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_ref`
> : +- 'SubqueryAlias `p`
> : +- 'UnresolvedRelation `sqoop_wongnai`.`wongnai__w_photo`
> +- 'SubqueryAlias `i`
> +- 'UnresolvedRelation `tests`.`restaurants_id_export_photos`
> == Analyzed Logical Plan ==
> id: bigint
> Project [id#1834834L]
> +- Filter (restaurant_id#1834836L = cast(360 as bigint))
> +- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
> :- Join Inner, (root_referrer_id#1834858L = id#1834834L)
> : :- SubqueryAlias `r`
> : : +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_ref`
> : : +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`,
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#1834834L,
> restaurant_id#1834836L,]
> : +- SubqueryAlias `p`
> : +- SubqueryAlias `sqoop_wongnai`.`wongnai__w_photo`
> : +- HiveTableRelation `sqoop_wongnai`.`wongnai__w_photo`,
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#1834858L, ...
> 14 more fields]
> +- SubqueryAlias `i`
> +- SubqueryAlias `tests`.`restaurants_id_export_photos`
> +- HiveTableRelation `tests`.`restaurants_id_export_photos`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> [restaurant_id#1834876L]== Optimized Logical Plan ==
> Project [id#1834834L]
> +- Join Inner, (restaurant_id#1834876L = restaurant_id#1834836L)
> :- Project [id#1834834L, restaurant_id#1834836L]
> : +- Join Inner, (root_referrer_id#1834858L = id#1834834L)
> : :- Project [id#1834834L, restaurant_id#1834836L]
> : : +- Filter ((isnotnull(restaurant_id#1834836L) &&
> (restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
> : : +- InMemoryRelation [id#1834834L, restaurant_id#1834836L],
> StorageLevel(disk, memory, deserialized, 1 replicas)
> : : +- Scan hive sqoop_wongnai.wongnai__w_ref [id#87357L,
> restaurant_id#87359L], HiveTableRelation `sqoop_wongnai`.`wongnai__w_ref`,
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L,
> restaurant_id#87359L]
> : +- Project [root_referrer_id#1834858L]
> : +- Filter (isnotnull(root_referrer_id#1834858L) &&
> bloomfilter#1835088 of [id#1834834L] filtering [root_referrer_id#1834858L])
> : +- InMemoryRelation [root_referrer_id#1834858L],
> StorageLevel(disk, memory, deserialized, 1 replicas)
> : +- Scan hive sqoop_wongnai.wongnai__w_photo
> [root_referrer_id#82906L], HiveTableRelation
> `sqoop_wongnai`.`wongnai__w_photo`,
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
> +- Filter ((isnotnull(restaurant_id#1834876L) && (restaurant_id#1834876L =
> 360)) && bloomfilter#1835087 of [restaurant_id#1834836L] filtering
> [restaurant_id#1834876L])
> +- HiveTableRelation `tests`.`restaurants_id_export_photos`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
> == Physical Plan ==
> *(7) Project [id#1834834L]
> +- *(7) SortMergeJoin [restaurant_id#1834836L], [restaurant_id#1834876L],
> Inner
> :- *(5) Sort [restaurant_id#1834836L ASC NULLS FIRST], false, 0
> : +- Exchange hashpartitioning(restaurant_id#1834836L, 200)
> : +- *(4) Project [id#1834834L, restaurant_id#1834836L]
> : +- *(4) SortMergeJoin [id#1834834L], [root_referrer_id#1834858L],
> Inner
> : :- *(2) Sort [id#1834834L ASC NULLS FIRST], false, 0
> : : +- Exchange hashpartitioning(id#1834834L, 200)
> : : +- *(1) Filter ((isnotnull(restaurant_id#1834836L) &&
> (restaurant_id#1834836L = 360)) && isnotnull(id#1834834L))
> : : +- *(1) InMemoryTableScan [id#1834834L,
> restaurant_id#1834836L], [isnotnull(restaurant_id#1834836L),
> (restaurant_id#1834836L = 360), isnotnull(id#1834834L)]
> : : +- InMemoryRelation [id#1834834L,
> restaurant_id#1834836L], StorageLevel(disk, memory, deserialized, 1 replicas)
> : : +- Scan hive sqoop_wongnai.wongnai__w_ref
> [id#87357L, restaurant_id#87359L], HiveTableRelation
> `sqoop_wongnai`.`wongnai__w_ref`,
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [id#87357L,
> restaurant_id#87359L]
> : +- *(3) Sort [root_referrer_id#1834858L ASC NULLS FIRST],
> false, 0
> : +- Exchange hashpartitioning(root_referrer_id#1834858L, 200)
> : +- Filter (isnotnull(root_referrer_id#1834858L) &&
> bloomfilter#1835088 of [bf1835088 id#1834834L estimatedNumRows=1457534]
> filtering [root_referrer_id#1834858L])
> : : +- GenerateBloomFilter
> : : +- ReusedExchange [id#1834834L,
> restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
> : +- InMemoryTableScan [root_referrer_id#1834858L],
> [isnotnull(root_referrer_id#1834858L), bloomfilter#1835088 of [bf1835088
> id#1834834L estimatedNumRows=1457534] filtering [root_referrer_id#1834858L]]
> : :- InMemoryRelation
> [root_referrer_id#1834858L], StorageLevel(disk, memory, deserialized, 1
> replicas)
> : : +- Scan hive
> sqoop_wongnai.wongnai__w_photo [root_referrer_id#82906L], HiveTableRelation
> `sqoop_wongnai`.`wongnai__w_photo`,
> org.apache.hadoop.hive.serde2.avro.AvroSerDe, [root_referrer_id#82906L]
> : +- GenerateBloomFilter
> : +- ReusedExchange [id#1834834L,
> restaurant_id#1834836L], Exchange hashpartitioning(id#1834834L, 200)
> +- *(6) Sort [restaurant_id#1834876L ASC NULLS FIRST], false, 0
> +- Exchange hashpartitioning(restaurant_id#1834876L, 200)
> +- Filter ((isnotnull(restaurant_id#1834876L) &&
> (restaurant_id#1834876L = 360)) && bloomfilter#1835087 of [bf1835088
> id#1834834L estimatedNumRows=1457534] filtering [restaurant_id#1834876L])
> : +- GenerateBloomFilter
> : +- ReusedExchange [id#1834834L, restaurant_id#1834836L],
> Exchange hashpartitioning(id#1834834L, 200)
> +- Scan hive tests.restaurants_id_export_photos
> [restaurant_id#1834876L], HiveTableRelation
> `tests`.`restaurants_id_export_photos`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [restaurant_id#1834876L]
> {code}
>
>
>
> I noticed that in physical plan
>
> {code:java}
> bloomfilter#1835087 of [bf1835088 id#1834834L estimatedNumRows=1457534]
> filtering [restaurant_id#1834876L]
> {code}
> it use bf1835088 id#1834834L for filtering restaurant_id#1834876L and which
> is different from join condition.
> I think that the correct Physical query plan should use bf1835087
> restaurant_id#1834836L for filtering restaurant_id#1834876L.
>
> You can see full of query plan of both incorrect and correct in this link:
> [https://gist.github.com/tanapoln/67a335858dede5eddd9d80659de60d72]
>
> I don't know why switching inner join statement yield the different result.
> I don't know it is a bug or I misunderstand join in Spark.
> Please help.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]