[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141631#comment-17141631
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-23723 at 6/22/20, 1:49 AM:
--------------------------------------------------------------------------

Iirc it was disabled by default because the rule pushes an exact limit, which 
means that it may result on introducing reducers throughout in the plan, which 
could result in additional stages (as you see in the plan above). Thus, it was 
only triggered via cost-based decision because if we were not filtering much 
data, it could result in regressions. Till we could explore this further and 
tune the cost-model, we decided to leave it disabled by default. Fwiw note that 
the rule can also push limit through other operators, e.g., union.
It would be great if we could enable the rule, identify the additionally 
created {{limit}} operators with a {{topn}} label, and pass the top-n 
information via hint to the Hive physical plan generation logic; this would 
also open a path to implement a way to being able to create {{topNKey}} 
operators from the SQL statement, as [~gopalv] suggested at some point. 
However, I understand this may be out of the scope of this patch.

Concerning your patch, it seems you are removing the original limit on top of 
the left outer join? Note that you cannot remove it : If you have 5 input rows 
on the left side, you know the LOJ will produce at least 5 rows, however you 
cannot guarantee the join will produce 5 rows at most. The {{Fetch Operator}} 
with limit is guaranteeing you get at most 5 rows, but since the match on the 
rule is a {{Limit}} operator, it could be anywhere in the plan, e.g., if CBO 
pushes limit operators through other operators.


was (Author: jcamachorodriguez):
Iirc it was disabled by default because the rule pushes an exact limit, which 
means that it may result on introducing reducers throughout in the plan, which 
could result in additional stages (as you see in the plan above). Thus, it was 
only triggered via cost-based decision because if we were not filtering much 
data, it could result in regressions. Till we could explore this further and 
tune the cost-model, we decided to leave it disabled by default. Fwiw note that 
the rule can also push limit through other operators, e.g., union.
It would be great if we could enable the rule, identify the additionally 
created {{limit}} operators with a {{topn}} label, and pass the top-n 
information via hint to the Hive physical plan generation logic; this would 
also open a path to implement a way to being able to create {{topNKey}} 
operators from the SQL statement, as [~gopalv] suggested at some point. 
However, I understand this may be out of the scope of this patch.

Concerning your patch, it seems you are removing the original limit on top of 
the left outer join? Note that you cannot remove it : If you have 5 input rows 
on the left side, you know the LOJ will produce at least 5 rows, however you 
cannot guarantee how many you will produce at most. The {{Fetch Operator}} with 
limit is guaranteeing you get at most 5 rows, but since the match on the rule 
is a {{Limit}} operator, it could be anywhere in the plan, e.g., if CBO pushes 
limit operators through other operators.

> Limit operator pushdown through LOJ
> -----------------------------------
>
>                 Key: HIVE-23723
>                 URL: https://issues.apache.org/jira/browse/HIVE-23723
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to