[ 
https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075778#comment-16075778
 ] 

liyunzhang_intel commented on HIVE-17010:
-----------------------------------------

[~csun]: found the problem on 3TB data. Actually the biggest table of tpc-ds 
"store_sales" does not exceed the max value of Long type(2^63-1). But 
[TPC-DS/query17|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query17.sql]
 is a query with many join.   We 
 use 
[numberOfBytes|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
 to collect the numberOfBytes of sibling of specified RS.  Here the sibling of 
specified RS maybe the result of join of big tables. The result execeed the max 
value of Long type.

> Fix the overflow problem of Long type in SetSparkReducerParallelism
> -------------------------------------------------------------------
>
>                 Key: HIVE-17010
>                 URL: https://issues.apache.org/jira/browse/HIVE-17010
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-17010.1.patch
>
>
> We use 
> [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
>  to collect the numberOfBytes of sibling of specified RS. We use Long type 
> and it happens overflow when the data is too big. After happening this 
> situation, the parallelism is decided by 
> [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
>  if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond 
> is a dymamic value which is decided by spark runtime. For example, the value 
> of sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility 
> that the value may be 1. The may problem here is the overflow of addition of 
> Long type.  You can reproduce the overflow problem by following code
> {code}
>     public static void main(String[] args) {
>       long a1= 9223372036854775807L;
>       long a2=1022672;
>       long res = a1+a2;
>       System.out.println(res);  //-9223372036853753137
>       BigInteger b1= BigInteger.valueOf(a1);
>       BigInteger b2 = BigInteger.valueOf(a2);
>       BigInteger bigRes = b1.add(b2);
>       System.out.println(bigRes); //9223372036855798479
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to