[
https://issues.apache.org/jira/browse/PIG-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198461#comment-14198461
]
Xuefu Zhang commented on PIG-4265:
----------------------------------
The difference could be caused by different compilers, scala vs java. Here is a
link to demo complier effect in dealing with double precision:
http://stackoverflow.com/questions/18840730/different-behaviours-for-double-precision-on-different-compiler
Regardless, the value given by different compilers should be close. However,
the problem here is that the casting happens before multiplication and
division. The result might be different if you put casting last.
Also, double value comparison is usually meaningless unless an error threshold
is given. Thus, 15.9999999999998 and 16.0000000001 are equal if we compare them
in double precision terms.
> SUM functions returns different value in spark and mapreduce engine
> -------------------------------------------------------------------
>
> Key: PIG-4265
> URL: https://issues.apache.org/jira/browse/PIG-4265
> Project: Pig
> Issue Type: Bug
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
>
> $PIG_HOME/bin/pig -x local RubyUDFs_10.pig
> #RubyUDFs_10.pig
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age), SUM(a.gpa);
> d = foreach c generate $0, $1, (double)((int)$2*100)/100;
> store d into 'local.output/RubyUDFs_10_benchmark.out';
> the result in RubyUDFs_10.out/part
> #grep "david s" RubyUDFs_10.out/part-r-00000
> david steinbeck 266 15.0
> #grep "david s" studenttab10k
> david steinbeck 21 2.44
> david steinbeck 33 1.17
> david steinbeck 42 1.94
> david steinbeck 42 1.35
> david steinbeck 31 2.77
> david steinbeck 40 2.42
> david steinbeck 57 3.91
> when runing Ruby_UDFs.pig in spark, the sum(a.gpa) is 16.0 and
> (double)((int)$2*100)/100 will be "david steinbeck 266 16.0".
> when running Ruby_UDFs.pig in mapreduce mode, the sum(a.gpa) is
> 15.999999999999998 and (double)((int)$2*100)/100 will be "david steinbeck
> 266 15.0".
> I don't know why the same code by different execution engines(spark and
> mapreduce) on the same os returns different results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)