Jesus Camacho Rodriguez created HIVE-13750:
----------------------------------------------

             Summary: Avoid additional shuffle stage created by Sorted Dynamic 
Partition Optimizer when possible
                 Key: HIVE-13750
                 URL: https://issues.apache.org/jira/browse/HIVE-13750
             Project: Hive
          Issue Type: Improvement
          Components: Physical Optimizer
    Affects Versions: 2.1.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


Extend ReduceDedup to remove additional shuffle stage created by sorted dynamic 
partition optimizer when possible, thus avoiding unnecessary work.

By [~ashutoshc]:
{quote}
Currently, if config is on Sorted Dynamic Partition Optimizer (SDPO) 
unconditionally adds an extra shuffle stage. If sort columns of previous 
shuffle and partitioning columns of table match, reduce sink deduplication 
optimizer removes extra shuffle stage, thus bringing down overhead to zero. 
However, if they don’t match, we end up doing extra shuffle. This can be 
improved since we can add table partition columns as a sort columns on earlier 
shuffle and avoid this extra shuffle. This ensures that in cases query already 
has a shuffle stage, we are not shuffling data again. 
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to