[
https://issues.apache.org/jira/browse/SPARK-30768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-30768:
----------------------------------
Affects Version/s: (was: 3.0.0)
3.1.0
> Constraints inferred from inequality attributes
> -----------------------------------------------
>
> Key: SPARK-30768
> URL: https://issues.apache.org/jira/browse/SPARK-30768
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Yuming Wang
> Priority: Major
>
> How to reproduce:
> {code:sql}
> create table SPARK_30768_1(c1 int, c2 int);
> create table SPARK_30768_2(c1 int, c2 int);
> {code}
> *Spark SQL*:
> {noformat}
> spark-sql> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on
> (t1.c1 > t2.c1) where t1.c1 = 3;
> == Physical Plan ==
> *(3) Project [c1#5, c2#6]
> +- BroadcastNestedLoopJoin BuildRight, Inner, (c1#5 > c1#7)
> :- *(1) Project [c1#5, c2#6]
> : +- *(1) Filter (isnotnull(c1#5) AND (c1#5 = 3))
> : +- *(1) ColumnarToRow
> : +- FileScan parquet default.spark_30768_1[c1#5,c2#6] Batched:
> true, DataFilters: [isnotnull(c1#5), (c1#5 = 3)], Format: Parquet, Location:
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
> PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,3)],
> ReadSchema: struct<c1:int,c2:int>
> +- BroadcastExchange IdentityBroadcastMode, [id=#60]
> +- *(2) Project [c1#7]
> +- *(2) Filter isnotnull(c1#7)
> +- *(2) ColumnarToRow
> +- FileScan parquet default.spark_30768_2[c1#7] Batched: true,
> DataFilters: [isnotnull(c1#7)], Format: Parquet, Location:
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
> PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema:
> struct<c1:int>
> {noformat}
> *Hive* support this feature:
> {noformat}
> hive> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on
> (t1.c1 > t2.c1) where t1.c1 = 3;
> Warning: Map Join MAPJOIN[13][bigTable=?] in task 'Stage-3:MAPRED' is a cross
> product
> OK
> STAGE DEPENDENCIES:
> Stage-4 is a root stage
> Stage-3 depends on stages: Stage-4
> Stage-0 depends on stages: Stage-3
> STAGE PLANS:
> Stage: Stage-4
> Map Reduce Local Work
> Alias -> Map Local Tables:
> $hdt$_0:t1
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> $hdt$_0:t1
> TableScan
> alias: t1
> filterExpr: (c1 = 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
> stats: NONE
> Filter Operator
> predicate: (c1 = 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> Select Operator
> expressions: c2 (type: int)
> outputColumnNames: _col1
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> HashTable Sink Operator
> keys:
> 0
> 1
> Stage: Stage-3
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t2
> filterExpr: (c1 < 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
> stats: NONE
> Filter Operator
> predicate: (c1 < 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> Select Operator
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0
> 1
> outputColumnNames: _col1
> Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL
> Column stats: NONE
> Select Operator
> expressions: 3 (type: int), _col1 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 1 Basic stats:
> PARTIAL Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Execution mode: vectorized
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> Time taken: 5.491 seconds, Fetched: 71 row(s)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]