[ 
https://issues.apache.org/jira/browse/PIG-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4774:
------------------------------------
    Description: 
For UDF backward compatibility issue after POStatus.STATUS_NULL refactory 
issue, PIG-4184 fixed the udfs to handle null by adding input.get(0) == null 
check in all the UDFs. UDFs extending AlgebraicMathBase, AVG, MIN, MAX, etc was 
not fixed.

Script to reproduce NPE. It is an odd usage doing aggregation after join 
instead of group by which one user was doing and rewrite moving aggregation 
after group by fixed the NPE. Might be rare, but there might be other cases 
where user call those functions with a bag directly without group by which 
might cause nulls to be passed to it.

A = LOAD '/tmp/data' as (f1:int, f2:int, f3:int);
B = LOAD '/tmp/data1' as (f1:int, f2:int, f3:int);
A1 = GROUP A by f1;
A2 = FOREACH A1 GENERATE group as f1, $1;
C = JOIN B by f1 LEFT, A2 by f1;
D = FOREACH C GENERATE B::f1, (double)SUM(A2::A.f3)/SUM(A2::A.f2);
STORE D into '/tmp/out';

  was:
For UDF backward compatibility issue after POStatus.STATUS_NULL refactory 
issue, PIG-4184 fixed the udfs to handle null by adding input.get(0) == null 
check in all the UDFs. UDFs extending AlgebraicMathBase which handle SUM, 
COUNT, etc was not fixed.

Script to reproduce NPE. It is an odd usage doing aggregation after join 
instead of group by which one user was doing and rewrite moving aggregation 
after group by fixed the NPE. Might be rare, but there might be other cases 
where user call those functions with a bag directly without group by which 
might cause nulls to be passed to it.

A = LOAD '/tmp/data' as (f1:int, f2:int, f3:int);
B = LOAD '/tmp/data1' as (f1:int, f2:int, f3:int);
A1 = GROUP A by f1;
A2 = FOREACH A1 GENERATE group as f1, $1;
C = JOIN B by f1 LEFT, A2 by f1;
D = FOREACH C GENERATE B::f1, (double)SUM(A2::A.f3)/SUM(A2::A.f2);
STORE D into '/tmp/out';

        Summary: Fix NPE in SUM,AVG,MIN,MAX UDFs for null bag input  (was: Fix 
NPE in AlgebraicMathBase UDFs)

  Added null check in TestBuiltin.java instead of a script as that covers all 
the different UDFs.

> Fix NPE in SUM,AVG,MIN,MAX UDFs for null bag input
> --------------------------------------------------
>
>                 Key: PIG-4774
>                 URL: https://issues.apache.org/jira/browse/PIG-4774
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4774-1-withoutwhitespacechanges.patch, 
> PIG-4774-1.patch, PIG-4774-2.patch
>
>
> For UDF backward compatibility issue after POStatus.STATUS_NULL refactory 
> issue, PIG-4184 fixed the udfs to handle null by adding input.get(0) == null 
> check in all the UDFs. UDFs extending AlgebraicMathBase, AVG, MIN, MAX, etc 
> was not fixed.
> Script to reproduce NPE. It is an odd usage doing aggregation after join 
> instead of group by which one user was doing and rewrite moving aggregation 
> after group by fixed the NPE. Might be rare, but there might be other cases 
> where user call those functions with a bag directly without group by which 
> might cause nulls to be passed to it.
> A = LOAD '/tmp/data' as (f1:int, f2:int, f3:int);
> B = LOAD '/tmp/data1' as (f1:int, f2:int, f3:int);
> A1 = GROUP A by f1;
> A2 = FOREACH A1 GENERATE group as f1, $1;
> C = JOIN B by f1 LEFT, A2 by f1;
> D = FOREACH C GENERATE B::f1, (double)SUM(A2::A.f3)/SUM(A2::A.f2);
> STORE D into '/tmp/out';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to