[jira] [Commented] (HIVE-28488) Merge adjacent union distinct

Sungwoo Park (Jira) Thu, 29 Aug 2024 17:41:06 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-28488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877924#comment-17877924
 ]


Sungwoo Park commented on HIVE-28488:
-------------------------------------

On 10TB TPC-DS benchmark (tested with Hive 4 on MR3),

query 49, before: 26.1s, after: 25.3s
query 75, before: 224.2s, after: 204.8s

> Merge adjacent union distinct
> -----------------------------
>
>                 Key: HIVE-28488
>                 URL: https://issues.apache.org/jira/browse/HIVE-28488
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Seonggon Namgung
>            Assignee: Seonggon Namgung
>            Priority: Major
>         Attachments: 1.MergeAdjacentUnionDistinct.pptx
>
>
> Current Hive compiles
> "SELECT * FROM TBL1 UNION SELECT * FROM TBL2 UNION SELECT * FROM TBL3"
> to
> {code:java}
> TS - GBY - RS
> TS - GBY - RS - GBY - RS
>            TS - GBY - RS - GBY {code}
> This can be optimized as follows:
> {code:java}
> TS - GBY - RS
> TS - GBY - RS
> TS - GBY - RS - GBY {code}
> Please check out the attached slides for detailed explanation and feel free 
> to ask any questions or share suggestions. Also, it would be glad if one can 
> share about better location of this optimization (e.g. SemanticAnalyzer, 
> Calcite, etc.).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28488) Merge adjacent union distinct

Reply via email to