[ 
https://issues.apache.org/jira/browse/HIVE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877925#comment-17877925
 ] 

Sungwoo Park commented on HIVE-28489:
-------------------------------------

On 10TB TPC-DS benchmark (tested with Hive 4 on MR3),

query 18, before: 30.6s, after: 28.2s
query 22, before: 53.1s, after: 18.0s
query 67, before: 842.3s, after: 429.1s


> Partitioning the input data of Grouping Set GroupBy operator
> ------------------------------------------------------------
>
>                 Key: HIVE-28489
>                 URL: https://issues.apache.org/jira/browse/HIVE-28489
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Seonggon Namgung
>            Assignee: Seonggon Namgung
>            Priority: Major
>         Attachments: 2.PartitionDataBeforeGroupingSet.pptx
>
>
> GroupBy operator with grouping sets often emits too many rows, which becomes 
> the bottleneck of query execution. To reduce the number output rows, this 
> JIRA proposes partitioning the input data of such GroupBy operator.
> Please check out the attached slides for detailed explanation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to