[
https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075839#comment-16075839
]
liyunzhang_intel commented on HIVE-17010:
-----------------------------------------
[~csun]:
the explain of the query17 without HIVE-17010.patch is in
[link|https://issues.apache.org/jira/secure/attachment/12875204/query17_explain.log].
Reduce3's datasize is 9223372036854775807
{code}
Reducer 3
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col28 (type: bigint), _col27 (type: bigint)
1 cs_bill_customer_sk (type: bigint), cs_item_sk (type:
bigint)
outputColumnNames: _col1, _col2, _col6, _col8, _col9, _col22,
_col27, _col28, _col34, _col35, _col45, _col51, _col63, _col66, _col82
Statistics: Num rows: 9223372036854775807 Data size:
9223372036854775807 Basic stats: COMPLETE Column stats: PARTIAL
Reduce Output Operator
key expressions: _col22 (type: bigint)
sort order: +
Map-reduce partition columns: _col22 (type: bigint)
Statistics: Num rows: 9223372036854775807 Data size:
9223372036854775807 Basic stats: COMPLETE Column stats: PARTIAL
value expressions: _col1 (type: bigint), _col2 (type:
bigint), _col6 (type: bigint), _col8 (type: bigint), _col9 (type: int), _col27
(type: bigint), _col28 (type: bigint), _col34 (type: bigint), _col35 (type:
int), _col45 (type: bigint), _col51 (type: bigint), _col63 (type: bigint),
_col66 (type: int), _col82
{code}
Map9's datasize is 1022672
{code}
Map 9
Map Operator Tree:
TableScan
alias: d1
filterExpr: (d_date_sk is not null and (d_quarter_name =
'2000Q1')) (type: boolean)
Statistics: Num rows: 73049 Data size: 2045372 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: (d_date_sk is not null and (d_quarter_name =
'2000Q1')) (type: boolean)
Statistics: Num rows: 36524 Data size: 1022672 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: d_date_sk (type: bigint)
sort order: +
Map-reduce partition columns: d_date_sk (type: bigint)
Statistics: Num rows: 36524 Data size: 1022672 Basic
stats: COMPLETE Column stats: NONE
{code}
There is a join of Map 9 and Reducer3
{code}
Reducer 4 <- Map 9 (PARTITION-LEVEL SORT, 1), Reducer 3
(PARTITION-LEVEL SORT, 1)
{code}
9223372036854775807 + 1022672
cause the problem
> Fix the overflow problem of Long type in SetSparkReducerParallelism
> -------------------------------------------------------------------
>
> Key: HIVE-17010
> URL: https://issues.apache.org/jira/browse/HIVE-17010
> Project: Hive
> Issue Type: Bug
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Attachments: HIVE-17010.1.patch
>
>
> We use
> [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
> to collect the numberOfBytes of sibling of specified RS. We use Long type
> and it happens overflow when the data is too big. After happening this
> situation, the parallelism is decided by
> [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
> if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond
> is a dymamic value which is decided by spark runtime. For example, the value
> of sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility
> that the value may be 1. The may problem here is the overflow of addition of
> Long type. You can reproduce the overflow problem by following code
> {code}
> public static void main(String[] args) {
> long a1= 9223372036854775807L;
> long a2=1022672;
> long res = a1+a2;
> System.out.println(res); //-9223372036853753137
> BigInteger b1= BigInteger.valueOf(a1);
> BigInteger b2 = BigInteger.valueOf(a2);
> BigInteger bigRes = b1.add(b2);
> System.out.println(bigRes); //9223372036855798479
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)