Leonid Chistov created SPARK-43339:
--------------------------------------
Summary: LEFT JOIN is treated as INNER JOIN when being in a middle
of double join
Key: SPARK-43339
URL: https://issues.apache.org/jira/browse/SPARK-43339
Project: Spark
Issue Type: Bug
Components: Optimizer
Affects Versions: 3.4.0
Reporter: Leonid Chistov
Consider query like
{code:java}
SELECT ss_item_sk
FROM store_sales
LEFT OUTER JOIN store_returns
ON ( sr_item_sk = ss_item_sk ),
reason
WHERE sr_reason_sk = r_reason_sk
AND r_reason_desc = 'reason 38'{code}
Spark generates following plan:
{code:java}
AdaptiveSparkPlan isFinalPlan=false
+- Project [ss_item_sk#2]
+- BroadcastHashJoin [sr_reason_sk#458], [r_reason_sk#734], Inner,
BuildRight, false
:- Project [ss_item_sk#2, sr_reason_sk#458]
: +- BroadcastHashJoin [ss_item_sk#2], [sr_item_sk#452], Inner,
BuildRight, false
: :- FileScan parquet [ss_item_sk#2] Batched: true, DataFilters: [],
Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_sales],
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ss_item_sk:int>
: +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0,
int, false] as bigint)),false), [id=#7227]
: +- Filter (isnotnull(sr_item_sk#452) AND
isnotnull(sr_reason_sk#458))
: +- FileScan parquet [sr_item_sk#452,sr_reason_sk#458]
Batched: true, DataFilters: [isnotnull(sr_item_sk#452),
isnotnull(sr_reason_sk#458)], Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_returns],
PartitionFilters: [], PushedFilters: [IsNotNull(sr_item_sk),
IsNotNull(sr_reason_sk)], ReadSchema: struct<sr_item_sk:int,sr_reason_sk:int>
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int,
true] as bigint)),false), [id=#7231]
+- Project [r_reason_sk#734]
+- Filter ((isnotnull(r_reason_desc#736) AND (r_reason_desc#736 =
reason 38)) AND isnotnull(r_reason_sk#734))
+- FileScan parquet [r_reason_sk#734,r_reason_desc#736] Batched:
true, DataFilters: [isnotnull(r_reason_desc#736), (r_reason_desc#736 = reason
38), isnotnull(r_reason_sk#734)], Format: Parquet, Location:
InMemoryFileIndex(1
paths)[file:/home/leonid/tpcds-spark-data-no-padding/reason], PartitionFilters:
[], PushedFilters: [IsNotNull(r_reason_desc), EqualTo(r_reason_desc,reason 38),
IsNotNull(r_reason_sk)], ReadSchema:
struct<r_reason_sk:int,r_reason_desc:string>
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]