Leonid Chistov created SPARK-43339:
--------------------------------------

             Summary: LEFT JOIN is treated as INNER JOIN when being in a middle 
of double join
                 Key: SPARK-43339
                 URL: https://issues.apache.org/jira/browse/SPARK-43339
             Project: Spark
          Issue Type: Bug
          Components: Optimizer
    Affects Versions: 3.4.0
            Reporter: Leonid Chistov


Consider query like

 
{code:java}
SELECT ss_item_sk
       FROM   store_sales
              LEFT OUTER JOIN store_returns
                           ON ( sr_item_sk = ss_item_sk ),
              reason
       WHERE  sr_reason_sk = r_reason_sk
              AND r_reason_desc = 'reason 38'{code}
 

Spark generates following plan:

 
{code:java}
AdaptiveSparkPlan isFinalPlan=false
+- Project [ss_item_sk#2]
   +- BroadcastHashJoin [sr_reason_sk#458], [r_reason_sk#734], Inner, 
BuildRight, false
      :- Project [ss_item_sk#2, sr_reason_sk#458]
      :  +- BroadcastHashJoin [ss_item_sk#2], [sr_item_sk#452], Inner, 
BuildRight, false
      :     :- FileScan parquet [ss_item_sk#2] Batched: true, DataFilters: [], 
Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_sales], 
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ss_item_sk:int>
      :     +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, 
int, false] as bigint)),false), [id=#7227]
      :        +- Filter (isnotnull(sr_item_sk#452) AND 
isnotnull(sr_reason_sk#458))
      :           +- FileScan parquet [sr_item_sk#452,sr_reason_sk#458] 
Batched: true, DataFilters: [isnotnull(sr_item_sk#452), 
isnotnull(sr_reason_sk#458)], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_returns], 
PartitionFilters: [], PushedFilters: [IsNotNull(sr_item_sk), 
IsNotNull(sr_reason_sk)], ReadSchema: struct<sr_item_sk:int,sr_reason_sk:int>
      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
true] as bigint)),false), [id=#7231]
         +- Project [r_reason_sk#734]
            +- Filter ((isnotnull(r_reason_desc#736) AND (r_reason_desc#736 = 
reason 38)) AND isnotnull(r_reason_sk#734))
               +- FileScan parquet [r_reason_sk#734,r_reason_desc#736] Batched: 
true, DataFilters: [isnotnull(r_reason_desc#736), (r_reason_desc#736 = reason 
38), isnotnull(r_reason_sk#734)], Format: Parquet, Location: 
InMemoryFileIndex(1 
paths)[file:/home/leonid/tpcds-spark-data-no-padding/reason], PartitionFilters: 
[], PushedFilters: [IsNotNull(r_reason_desc), EqualTo(r_reason_desc,reason 38), 
IsNotNull(r_reason_sk)], ReadSchema: 
struct<r_reason_sk:int,r_reason_desc:string>
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to