[
https://issues.apache.org/jira/browse/HIVE-20954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698459#comment-16698459
]
Gopal V commented on HIVE-20954:
--------------------------------
To recap the changes.
Here's compat matrices to compare
||RS_2||RS_2||Result||
|UNSET|UNSET| Dedup with UNSET|
|FIXED | FIXED | Dedup only if num-reducers are same |
|UNIFORM+AUTOPARALLEL | UNIFORM+AUTOPARALLEL| Dedup always (use higher number
of reducers)|
That's the easy case, now for the combo (and vice versa)
||RS_2||RS_2||Result||
|UNSET|FIXED| Dedup with FIXED|
|UNSET|UNIFORM| Dedup with UNIFORM|
|UNSET|UNIFORM+AUTOPARALLEL| Dedup with UNIFORM|
|UNIFORM|UNIFORM+AUTOPARALLEL| Dedup with UNIFORM|
|UNIFORM|AUTOPARALLEL| No dedup|
|UNIFORM|FIXED| No Dedup |
[~teddy.choi]: the patch LGTM +1 - several queries the shared work is kicking
in properly (i.e reducers are getting removed).
The cbo_limit.q seems to be a test diff flakiness.
The others are failing with an odd NPE
{code}
java.lang.NullPointerException
at
org.apache.hive.jdbc.BaseJdbcWithMiniLlap.tearDown(BaseJdbcWithMiniLlap.java:153)
{code}
Both failures look unrelated, but deserve their own follow-up bugs.
> Vector RS operator is not using uniform hash function for TPC-DS query 95
> -------------------------------------------------------------------------
>
> Key: HIVE-20954
> URL: https://issues.apache.org/jira/browse/HIVE-20954
> Project: Hive
> Issue Type: Improvement
> Reporter: Teddy Choi
> Assignee: Teddy Choi
> Priority: Major
> Labels: pull-request-available
> Attachments: HIVE-20954.1.patch, HIVE-20954.2.patch
>
>
> Distribution of rows is skewed in DHJ causing slowdown.
> Same RS outputs, but the two branches use VectorReduceSinkObjectHashOperator
> and VectorReduceSinkLongOperator.
> {code}
> | Select Operator |
> | expressions: ws_warehouse_sk (type: bigint),
> ws_order_number (type: bigint) |
> | outputColumnNames: _col0, _col1 |
> | Select Vectorization: |
> | className: VectorSelectOperator |
> | native: true |
> | projectedOutputColumnNums: [14, 16] |
> | Statistics: Num rows: 7199963324 Data size:
> 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> | key expressions: _col1 (type: bigint) |
> | sort order: + |
> | Map-reduce partition columns: _col1 (type: bigint) |
> | Reduce Sink Vectorization: |
> | className: VectorReduceSinkObjectHashOperator |
> | keyColumnNums: [16] |
> | native: true |
> | nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true,
> LazyBinarySerDe for values IS true |
> | partitionColumnNums: [16] |
> | valueColumnNums: [14] |
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Statistics: Num rows: 7199963324 Data size:
> 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> | value expressions: _col0 (type: bigint) |
> | Reduce Output Operator |
> | key expressions: _col1 (type: bigint) |
> | sort order: + |
> | Map-reduce partition columns: _col1 (type: bigint) |
> | Reduce Sink Vectorization: |
> | className: VectorReduceSinkLongOperator |
> | keyColumnNums: [16] |
> | native: true |
> | nativeConditionsMet:
> hive.vectorized.execution.reducesink.new.enabled IS true,
> hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true,
> LazyBinarySerDe for values IS true |
> | valueColumnNums: [14] |
> | Statistics: Num rows: 7199963324 Data size:
> 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
> | value expressions: _col0 (type: bigint) |
> | Execution mode: vectorized, llap |
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)