liyunzhang_intel created PIG-5230:
-------------------------------------
Summary: Fix the RuntimeException throws in SecondaryKeySortUtil
Key: PIG-5230
URL: https://issues.apache.org/jira/browse/PIG-5230
Project: Pig
Issue Type: Bug
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel
there is possibility that [curKey is null|
https://github.com/apache/pig/blob/63968e3132ad1fee06dffcacb8ea5d399e0edef5/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java#L116]
after PIG-5164. we should remove the code to avoid RuntimeException.
following script can trigger the exception.
{code}
a = load './studenttab10k.mk1' as (name, age:int, gpa:float);
a1 = filter a by gpa is null or gpa >= 3.9;
a2 = filter a by gpa < 2;
b = union a1, a2;
c = load './voternulltab10k' as (name, age, registration, contributions);
d = join b by name left outer, c by name using 'replicated';
e = stream d through `cat` as (name, age, gpa, name1, age1, registration,
contributions);
f = foreach e generate name, age, gpa, registration, contributions;
g = group f by name;
g1 = group f by name; -- Two separate groupbys to ensure secondary key
partitioner
h = foreach g {
inner1 = order f by age, gpa, registration, contributions;
inner2 = limit inner1 1;
generate inner2, SUM(f.age); };
i = foreach g1 {
inner1 = order f by age asc, gpa desc, registration asc, contributions desc;
inner2 = limit inner1 1;
generate inner2, SUM(f.age); };
store h into './MultiQuery_Union_3.1.out';
store i into './MultiQuery_Union_3.2.out';
{code}
cat studenttab10k.mk1
{code}
ulysses thompson 64 1.90
katie carson 25 3.65
65 0.73
holly davidson 57 2.43
fred miller 55 3.77
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)