Steve Carlin created HIVE-26283: ----------------------------------- Summary: Need better decision making for creating SortedDynPartitionOptimizer Key: HIVE-26283 URL: https://issues.apache.org/jira/browse/HIVE-26283 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Steve Carlin
When the hive.optimize.sort.dynamic.partition.threshold param is set to 0, the optimizer decides whether to create the SortedDynPartitionOptimizer class. In production, we've seen this making the wrong decision when there is a simple INSERT..SELECT into a partitioned table and the data being inserted is skewed towards one partition. In this case, it still is creating the SortedDynPartitionOptimizer. This forces a reducer step and all the data gets sent to the same reducer. In order to reproduce this, you may also have to turn off "autogather" stats since this also will create a reducer step. What we ultimately want is just a mapper step so the load is evenly distributed across the mappers. -- This message was sent by Atlassian Jira (v8.20.7#820007)