[ 
https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874006#comment-13874006
 ] 

Hiten Java commented on PIG-3668:
---------------------------------

If you see the formula for Pearson co-efficient, if there is no deviation at 
all for all values of x or y, then it results in NaN due to 'divide by zero' 
error.

Before the change, the indication to the user was NaN values in output for all 
values. Now, it will only show NaN for the affected columns.

I have only handled the cases where there is a possibility of 'divide by zero'.


> COR built-in function when atleast one of the coefficient values is NaN
> -----------------------------------------------------------------------
>
>                 Key: PIG-3668
>                 URL: https://issues.apache.org/jira/browse/PIG-3668
>             Project: Pig
>          Issue Type: Bug
>          Components: internal-udfs
>    Affects Versions: 0.12.0, 0.11.1, 0.12.1
>            Reporter: Hiten Java
>            Assignee: Hiten Java
>         Attachments: CORR.diff
>
>
> When passing multiple column keys for Correlation analysis, if coefficient 
> value of one of the combinations is NaN, then the value for all other 
> combinations is not computed.
> Pearson Co-efficient value is NaN if all values for a given column are the 
> same.
> Example:
> A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
> B = group A all;
> c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) 
> A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, 
> (bag{tuple(double)}) A.col_4));
> If the value of pearson coefficient for col_1 and col_2 is NaN, then value of 
> co-efficients for all combinations is NaN
> This is happening because of 'return null' statement in catch block on lines 
> 157 and 235 in file org.apache.pig.builtin.COR.java
> If the catch block is removed, then the correlation analysis would continue 
> for the remaining columns. (ApachePig 0.12.0)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to