Steve Carlin created HIVE-26283:
-----------------------------------

             Summary: Need better decision making for creating 
SortedDynPartitionOptimizer
                 Key: HIVE-26283
                 URL: https://issues.apache.org/jira/browse/HIVE-26283
             Project: Hive
          Issue Type: Bug
          Components: Logical Optimizer
            Reporter: Steve Carlin


When the hive.optimize.sort.dynamic.partition.threshold param is set to 0, the 
optimizer decides whether to create the SortedDynPartitionOptimizer class.  

In production, we've seen this making the wrong decision when there is a simple 
INSERT..SELECT into a partitioned table and the data being inserted is skewed 
towards one partition. 

In this case, it still is creating the SortedDynPartitionOptimizer.  This 
forces a reducer step and all the data gets sent to the same reducer.

In order to reproduce this, you may also have to turn off "autogather" stats 
since this also will create a reducer step.

What we ultimately want is just a mapper step so the load is evenly distributed 
across the mappers.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to