[ 
https://issues.apache.org/jira/browse/HIVE-23365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-23365:
-------------------------------------------
    Description: Currently, RS deduplication is always executed whenever it is 
semantically correct. However, it could be beneficial to leave both RS 
operators in the plan, e.g., if the NDV of the second RS is very low. Thus, we 
would like this decision to be cost-based. We could use a simple heuristic that 
would work fine for most of the cases without introducing regressions for 
existing cases, e.g., if NDV for partition column is less than estimated 
parallelism in the second RS, do not execute deduplication.  (was: Currently, 
RS deduplication is always executed whenever it is semantically correct. 
However, it could be beneficial if t to leave both RS operators in the plan, 
e.g., if the NDV of the second RS is very low. Thus, we would like this 
decision to be cost-based. We could use a simple heuristic that would work fine 
for most of the cases without introducing regressions for existing cases, e.g., 
if NDV for partition column is less than estimated parallelism in the second 
RS, do not execute deduplication.)

> Put RS deduplication optimization under cost based decision
> -----------------------------------------------------------
>
>                 Key: HIVE-23365
>                 URL: https://issues.apache.org/jira/browse/HIVE-23365
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>            Reporter: Jesus Camacho Rodriguez
>            Priority: Major
>
> Currently, RS deduplication is always executed whenever it is semantically 
> correct. However, it could be beneficial to leave both RS operators in the 
> plan, e.g., if the NDV of the second RS is very low. Thus, we would like this 
> decision to be cost-based. We could use a simple heuristic that would work 
> fine for most of the cases without introducing regressions for existing 
> cases, e.g., if NDV for partition column is less than estimated parallelism 
> in the second RS, do not execute deduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to