[
https://issues.apache.org/jira/browse/HIVE-23365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jesus Camacho Rodriguez updated HIVE-23365:
-------------------------------------------
Description: Currently, RS deduplication is always executed whenever it is
semantically correct. However, it could be beneficial to leave both RS
operators in the plan, e.g., if the NDV of the second RS is very low. Thus, we
would like this decision to be cost-based. We could use a simple heuristic that
would work fine for most of the cases without introducing regressions for
existing cases, e.g., if NDV for partition column is less than estimated
parallelism in the second RS, do not execute deduplication. (was: Currently,
RS deduplication is always executed whenever it is semantically correct.
However, it could be beneficial if t to leave both RS operators in the plan,
e.g., if the NDV of the second RS is very low. Thus, we would like this
decision to be cost-based. We could use a simple heuristic that would work fine
for most of the cases without introducing regressions for existing cases, e.g.,
if NDV for partition column is less than estimated parallelism in the second
RS, do not execute deduplication.)
> Put RS deduplication optimization under cost based decision
> -----------------------------------------------------------
>
> Key: HIVE-23365
> URL: https://issues.apache.org/jira/browse/HIVE-23365
> Project: Hive
> Issue Type: Improvement
> Components: Physical Optimizer
> Reporter: Jesus Camacho Rodriguez
> Priority: Major
>
> Currently, RS deduplication is always executed whenever it is semantically
> correct. However, it could be beneficial to leave both RS operators in the
> plan, e.g., if the NDV of the second RS is very low. Thus, we would like this
> decision to be cost-based. We could use a simple heuristic that would work
> fine for most of the cases without introducing regressions for existing
> cases, e.g., if NDV for partition column is less than estimated parallelism
> in the second RS, do not execute deduplication.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)