[ 
https://issues.apache.org/jira/browse/PIG-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197754#comment-14197754
 ] 

liyunzhang_intel commented on PIG-4265:
---------------------------------------

Thanks [~xuefuz]'s comment. I made some mistakes in previous bug description. 
After further investigation, i found the problem is not on "Java double 
precision problems" of AlgebraicDoubleMathBase.java
but on other issues maybe.
{code}
Ruby_UDFs.pig
a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age), SUM(a.gpa);
d = foreach c generate $0, $1, (double)((int)$2*100)/100;
store d into 'RubyUDFs_10_benchmark.out';
{code}
run Ruby_UDFs.pig in spark, the sum(a.gpa) is 16.0  and 
(double)((int)$2*100)/100  will be  "david steinbeck    266     16.0".
run Ruby_UDFs.pig in mapreduce mode, the sum(a.gpa) is 15.999999999999998 and 
(double)((int)$2*100)/100 will  be "david steinbeck       266     15.0".

As [~xuefuz] said 
{quote}
 It's expected that a double value 16 is represented by a system as 
15.99999999999 or 16.000000001. The problem seems to be the casting of the 
double value to int
{quote}
I don't know why the same code by different execution engines(spark and 
mapreduce) on the same os returns different results.  I will investigate more.
I will rename the bug title to SUM functions returns different value in spark 
and mapreduce engine.

> AlgebraicDoubleMathBase has "Java double precision problems"
> ------------------------------------------------------------
>
>                 Key: PIG-4265
>                 URL: https://issues.apache.org/jira/browse/PIG-4265
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4265.patch
>
>
> $PIG_HOME/bin/pig -x local RubyUDFs_10.pig
> #RubyUDFs_10.pig
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age), SUM(a.gpa);
> d = foreach c generate $0, $1, (double)((int)$2*100)/100;
> store d into 'local.output/RubyUDFs_10_benchmark.out';
> the result in RubyUDFs_10.out/part
> #grep "david s" RubyUDFs_10.out/part-r-00000 
> david steinbeck       266     15.0
> #grep "david s" studenttab10k
> david steinbeck       21      2.44
> david steinbeck       33      1.17
> david steinbeck       42      1.94
> david steinbeck       42      1.35
> david steinbeck       31      2.77
> david steinbeck       40      2.42
> david steinbeck       57      3.91
> when you sum all the gpa of "david steinbeck" in the file "studenttab10k", 
> the result is "16" while the result in RubyUDFs_10.out/part-r-00000 is "15". 
> The reason is because double precision problem in 
> AlgebraicDoubleMathBase.java.
> It sums all the gpa numbers to 15.999999-(double)((int)15.999999*100)/100 = 
> 15.0.
> {code}
> AlgebraicDoubleMathBase.java
>     private static Double doWork(Double arg1, Double arg2, KNOWN_OP op) {
>         if (arg1 == null) {
>             return arg2;
>         } else if (arg2 == null) {
>             return arg1;
>         } else {
>             switch (op) {
>             case MAX: return Math.max(arg1, arg2);
>             case MIN: return Math.min(arg1, arg2);
>             case SUM: return arg1+arg2;  //this line has "Java BigDecimal 
> precision problem"
>             default: return null;
>             }
>         }
>     }
> {code}
> The detail Java double precision problem you can refer 
> "https://community.oracle.com/thread/2448849?tstart=0";



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to