[jira] [Work logged] (HIVE-20954) Vector RS operator is not using uniform hash function for TPC-DS query 95

ASF GitHub Bot (Jira) Tue, 09 Jun 2020 09:36:27 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-20954?focusedWorklogId=443141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443141
 ]


ASF GitHub Bot logged work on HIVE-20954:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Jun/20 16:35
            Start Date: 09/Jun/20 16:35
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on pull request #492:
URL: https://github.com/apache/hive/pull/492#issuecomment-641144189


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the [email protected] list if the patch is in 
need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 443141)
    Remaining Estimate: 0h
            Time Spent: 10m

> Vector RS operator is not using uniform hash function for TPC-DS query 95
> -------------------------------------------------------------------------
>
>                 Key: HIVE-20954
>                 URL: https://issues.apache.org/jira/browse/HIVE-20954
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-20954.1.patch, HIVE-20954.2.patch, 
> HIVE-20954.3.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Distribution of rows is skewed in DHJ causing slowdown.
> Same RS outputs, but the two branches use VectorReduceSinkObjectHashOperator 
> and VectorReduceSinkLongOperator.
> {code}
> |                     Select Operator                |
> |                       expressions: ws_warehouse_sk (type: bigint), 
> ws_order_number (type: bigint) |
> |                       outputColumnNames: _col0, _col1 |
> |                       Select Vectorization:        |
> |                           className: VectorSelectOperator |
> |                           native: true             |
> |                           projectedOutputColumnNums: [14, 16] |
> |                       Statistics: Num rows: 7199963324 Data size: 
> 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> |                       Reduce Output Operator       |
> |                         key expressions: _col1 (type: bigint) |
> |                         sort order: +              |
> |                         Map-reduce partition columns: _col1 (type: bigint) |
> |                         Reduce Sink Vectorization: |
> |                             className: VectorReduceSinkObjectHashOperator |
> |                             keyColumnNums: [16]    |
> |                             native: true           |
> |                             nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No 
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true, 
> LazyBinarySerDe for values IS true |
> |                             partitionColumnNums: [16] |
> |                             valueColumnNums: [14]  |
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> |                         Statistics: Num rows: 7199963324 Data size: 
> 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> |                         value expressions: _col0 (type: bigint) |
> |                       Reduce Output Operator       |
> |                         key expressions: _col1 (type: bigint) |
> |                         sort order: +              |
> |                         Map-reduce partition columns: _col1 (type: bigint) |
> |                         Reduce Sink Vectorization: |
> |                             className: VectorReduceSinkLongOperator |
> |                             keyColumnNums: [16]    |
> |                             native: true           |
> |                             nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No 
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true, 
> LazyBinarySerDe for values IS true |
> |                             valueColumnNums: [14]  |
> |                         Statistics: Num rows: 7199963324 Data size: 
> 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> |                         value expressions: _col0 (type: bigint) |
> |             Execution mode: vectorized, llap       |
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-20954) Vector RS operator is not using uniform hash function for TPC-DS query 95

Reply via email to