Rishi Pandey created PIG-3907:
---------------------------------
Summary: In-Built function COR does not work with any other
numeric type other than double.
Key: PIG-3907
URL: https://issues.apache.org/jira/browse/PIG-3907
Project: Pig
Issue Type: Bug
Components: build, piggybank
Affects Versions: 0.11.1
Reporter: Rishi Pandey
Apache pig provides in-built function 'COR' (correlation). COR is used to
calculate the correlation between various variables.
COR function does not work if we provide any variable of datatype int or long.
We need to explicitly cast the variables as double in the pig script. Which is
never a good idea on the UI end.
I have tried to unit test the correlation function by supplying some int values
and it fails to iterate the bag. Same is the case, when supplying some int,long
and double variables as input parameters to the COR function. However, my unit
test for doubles gives the correct output.
I have also tried to run the script on Hadoop Cluster, it fails if we have any
variable other than double.
It shows the following error on Hadoop cluster:
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal
error. null
or sometimes ERROR 1066: Unable to open iterator for alias aliasName. Backend
error : null
In the Java Code of COR function, it casts everything to double, which is
correct.But in the computeAll(--,--) function, the cast on iterators to yield x
and y does creates a problem.
exact code :
double x =(Double)iterator_x.next().get(0); // error when int or long
double y =(Double)iterator_y.next().get(0); // error when int or long
Solutions: could be overriding the method getArgToFuncMapping() and defining
Various classes IntCOR, LongCOR,FloatCOR. As it is done for some other UDFs
like VAR.
Please, fix the issue in piggybank as well as in Built-in Library of Pig.
I am using Apache pig 0.11
--
This message was sent by Atlassian JIRA
(v6.2#6252)