[GitHub] [arrow-datafusion] mingmwang commented on pull request #4923: Support filter pushdown for semi/anti join

GitBox Wed, 18 Jan 2023 00:16:53 -0800


mingmwang commented on PR #4923:
URL: 
https://github.com/apache/arrow-datafusion/pull/4923#issuecomment-1386652044


   @alamb @ygf11 
   
   Just share you the SparkSQL's result:
   
   ```sql
   explain extended
   SELECT t1_id, t1_name FROM t1 LEFT SEMI JOIN t2 ON (t1_id = t2_id and t2_id 
>= 100 and t1_id >= 100);
   ```
   
   ```sql
   == Optimized Logical Plan ==
   Join LeftSemi, (t1_id#1987050 = t2_id#1987346), Statistics(sizeInBytes=1.0 B)
   :- Filter (isnotnull(t1_id#1987050) AND (t1_id#1987050 >= 100)), 
Statistics(sizeInBytes=1.0 B)
   :  +- Relation access_views.t1[t1_id#1987050,t1_name#1987051] parquet, 
Statistics(sizeInBytes=0.0 B)
   +- Project [t2_id#1987346], Statistics(sizeInBytes=1.0 B)
      +- Filter (isnotnull(t2_id#1987346) AND (t2_id#1987346 >= 100)), 
Statistics(sizeInBytes=1.0 B)
         +- Relation access_views.t2[t2_id#1987346,t2_name#1987347] parquet, 
Statistics(sizeInBytes=0.0 B)
   
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- BroadcastHashJoin [t1_id#1987050], [t2_id#1987346], LeftSemi, BuildRight, 
false
      :- Project [t1_id#1987050, t1_name#1987051]
      :  +- Filter (isnotnull(t1_id#1987050) AND (t1_id#1987050 >= 100))
      :     +- FileScan parquet access_views.t1[t1_id#1987050,t1_name#1987051] 
Batched: true, DataFilters: [isnotnull(t1_id#1987050), (t1_id#1987050 >= 100)], 
Format: Parquet, Location: 
InMemoryFileIndex[viewfs://hermes-rno/tmp/spark/[email protected]/temp-d6be7dc2-736c-4205-8b...,
 PartitionFilters: [], PushedFilters: [IsNotNull(t1_id), 
GreaterThanOrEqual(t1_id,100)], ReadSchema: struct<t1_id:int,t1_name:string>, 
UsedIndexes: []
      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
true] as bigint)),false), [id=#301830]
         +- Project [t2_id#1987346]
            +- Filter (isnotnull(t2_id#1987346) AND (t2_id#1987346 >= 100))
               +- FileScan parquet access_views.t2[t2_id#1987346] Batched: 
true, DataFilters: [isnotnull(t2_id#1987346), (t2_id#1987346 >= 100)], Format: 
Parquet, Location: 
InMemoryFileIndex[viewfs://hermes-rno/tmp/spark/[email protected]/temp-d6be7dc2-736c-4205-8b...,
 PartitionFilters: [], PushedFilters: [IsNotNull(t2_id), 
GreaterThanOrEqual(t2_id,100)], ReadSchema: struct<t2_id:int>, UsedIndexes: []
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4923: Support filter pushdown for semi/anti join

Reply via email to