liyunzhang_intel created PIG-5230:
-------------------------------------

             Summary: Fix the RuntimeException throws in SecondaryKeySortUtil
                 Key: PIG-5230
                 URL: https://issues.apache.org/jira/browse/PIG-5230
             Project: Pig
          Issue Type: Bug
            Reporter: liyunzhang_intel
            Assignee: liyunzhang_intel


there is possibility that [curKey is null| 
https://github.com/apache/pig/blob/63968e3132ad1fee06dffcacb8ea5d399e0edef5/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java#L116]
 after PIG-5164.  we should remove the code to avoid RuntimeException.


following script can trigger the exception.
{code}
a = load './studenttab10k.mk1' as (name, age:int, gpa:float);
a1 = filter a by gpa is null or gpa >= 3.9;
a2 = filter a by gpa < 2;
b = union a1, a2;
c = load './voternulltab10k' as (name, age, registration, contributions);
d = join b by name left outer, c by name using 'replicated';
e = stream d through `cat` as (name, age, gpa, name1, age1, registration, 
contributions);
f = foreach e generate name, age, gpa, registration, contributions;
g = group f by name;
g1 = group f by name; -- Two separate groupbys to ensure secondary key 
partitioner
h = foreach g { 
    inner1 = order f by age, gpa, registration, contributions;
    inner2 = limit inner1 1;
    generate inner2, SUM(f.age); };
i = foreach g1 {
    inner1 = order f by age asc, gpa desc, registration asc, contributions desc;
    inner2 = limit inner1 1;
    generate inner2, SUM(f.age); };
store h into './MultiQuery_Union_3.1.out';
store i into './MultiQuery_Union_3.2.out';
{code}


cat studenttab10k.mk1
{code}
ulysses thompson        64      1.90
katie carson    25      3.65
        65      0.73
holly davidson  57      2.43
fred miller     55      3.77

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to