[
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046080#comment-16046080
]
Vineet Garg commented on HIVE-6348:
-----------------------------------
[~ashutoshc] Plan generated after subquery remove rule/de-correlation doesn't
generate HiveSortLimit on HiveSortLimit e.g. for query {code:sql} select * from
part where p_size IN (select p_size from part p where p.p_type <> part.p_name
order by p_size) {code} plan just after decorrelation looks like
{code:sql}
HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3],
p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8])
HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3],
p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8],
BLOCK__OFFSET__INSIDE__FILE=[$9], INPUT__FILE__NAME=[$10], ROW__ID=[$11])
LogicalJoin(condition=[AND(<>($1, $13), =($5, $12))], joinType=[inner])
HiveTableScan(table=[[default.part]], table:alias=[part])
HiveAggregate(group=[{0, 1}])
HiveProject(p_size=[$0], p_type0=[$1])
HiveProject(p_size=[$0], p_type0=[$13])
HiveSortLimit(sort0=[$0], dir0=[ASC-nulls-first])
HiveProject(p_size=[$5], p_partkey=[$0], p_name=[$1],
p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size1=[$5], p_container=[$6],
p_retailprice=[$7], p_comment=[$8], block__offset__inside__file=[$9],
input__file__name=[$10], row__id=[$11], p_type0=[$4])
LogicalFilter(condition=[IS NOT NULL($4)])
HiveTableScan(table=[[default.part]], table:alias=[p])
{code}
So you have one sort limit on right side of join. One possible rule could be
if top project doesn't project any column/expression from right side then
remove HiveSortLimit from right side of join.
> Order by/Sort by in subquery
> ----------------------------
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
> Issue Type: Bug
> Reporter: Gunther Hagleitner
> Assignee: Rui Li
> Priority: Minor
> Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any
> order by/sort by in the sub query unless you use 'limit '. Could even go so
> far as barring it at the semantic level.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)