[
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140588#comment-17140588
]
Attila Magyar commented on HIVE-23723:
--------------------------------------
[~jcamachorodriguez], thanks for letting me know, I haven't realized it. Any
idea why it is disabled by default?
Also the plan looks different with limittranspose, not sure why. There are 3
Limit operators. The first one is what was pushed through the LOJ. But there
are 2 others in Reducer 2.
{code:java}
explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key
= src2.key) LIMIT 5; {code}
{code:java}
PREHOOK: query: explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key
= src2.key) LIMIT 5
PREHOOK: type: QUERY
PREHOOK: Input: default@src
#### A masked pattern was here ####
POSTHOOK: query: explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key
= src2.key) LIMIT 5
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
#### A masked pattern was here ####
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
Reducer 3 <- Map 4 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: src1
Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 5 Data size: 435 Basic stats:
COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.3
value expressions: _col0 (type: string)
Execution mode: vectorized, llap
LLAP IO: no inputs
Map 4
Map Operator Tree:
TableScan
alias: src2
filterExpr: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 500 Data size: 89000 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
Execution mode: vectorized, llap
LLAP IO: no inputs
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: VALUE._col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE
Column stats: COMPLETE
Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 5 Data size: 435 Basic stats:
COMPLETE Column stats: COMPLETE
Reducer 3
Execution mode: llap
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col2
Statistics: Num rows: 12 Data size: 1772 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: _col0 (type: string), _col2 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 12 Data size: 1772 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 12 Data size: 1772 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: 5
Processor Tree:
ListSink {code}
> Limit operator pushdown through LOJ
> -----------------------------------
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
> Issue Type: Improvement
> Components: Hive
> Reporter: Attila Magyar
> Assignee: Attila Magyar
> Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT
> OUTER JOINs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)