[ 
https://issues.apache.org/jira/browse/PIG-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4774:
------------------------------------
      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Committed patch to trunk. Thanks for the review Daniel.

> Fix NPE in SUM,AVG,MIN,MAX UDFs for null bag input
> --------------------------------------------------
>
>                 Key: PIG-4774
>                 URL: https://issues.apache.org/jira/browse/PIG-4774
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4774-1-withoutwhitespacechanges.patch, 
> PIG-4774-1.patch, PIG-4774-2.patch
>
>
> For UDF backward compatibility issue after POStatus.STATUS_NULL refactory 
> issue, PIG-4184 fixed the udfs to handle null by adding input.get(0) == null 
> check in all the UDFs. UDFs extending AlgebraicMathBase, AVG, MIN, MAX, etc 
> was not fixed.
> Script to reproduce NPE. It is an odd usage doing aggregation after join 
> instead of group by which one user was doing and rewrite moving aggregation 
> after group by fixed the NPE. Might be rare, but there might be other cases 
> where user call those functions with a bag directly without group by which 
> might cause nulls to be passed to it.
> A = LOAD '/tmp/data' as (f1:int, f2:int, f3:int);
> B = LOAD '/tmp/data1' as (f1:int, f2:int, f3:int);
> A1 = GROUP A by f1;
> A2 = FOREACH A1 GENERATE group as f1, $1;
> C = JOIN B by f1 LEFT, A2 by f1;
> D = FOREACH C GENERATE B::f1, (double)SUM(A2::A.f3)/SUM(A2::A.f2);
> STORE D into '/tmp/out';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to