Sergey created PIG-3408:
---------------------------

             Summary: AvroStorage fails to save relation with single null tuple
                 Key: PIG-3408
                 URL: https://issues.apache.org/jira/browse/PIG-3408
             Project: Pig
          Issue Type: Bug
          Components: data
    Affects Versions: 0.11
         Environment: cluster, local
            Reporter: Sergey


Hi, I have a jython UDF with schema

{code}
@outputSchema("splitted_pivots:tuple(route_pivots:bag{tuple()}, 
last_event:bag{tuple()})")
def split_last_end_points(bag_with_pivots, startOfHour, tuple_schema_as_str):
#some code goes here
    return current_hour_pivots, [last_event_pivot] 
{code}
last_event_pivot should contain one tuple. By default it's None.
It's normal case for the udf to return last_event_pivot=None

Then I try to store this value:
{code}
lastEvents24FromCurrHour = FOREACH pivotsWithEndPoints generate FLATTEN 
(splitted_pivots.last_event) as  (msisdn: long, more_fields)


--Stupid hack split_last_end_points can return null for
--lastEvents24FromCurrHourFiltered = FILTER lastEvents24FromCurrHour by 
is_end_point is not null and end_point_type is not null;
STORE lastEvents24FromCurrHourFiltered INTO '$lastEndPoints24hOut'
USING
org.apache.pig.piggybank.storage.avro.AvroStorage('index', '4', 'schema', 
'{"name": "last_end_points_24h", "doc": "version 0.0.1", "type": "record", 
"fields": [
   {"name": "msisdn",        "type": "long"},
   {"name": "more_fields",   "type": "int"}
]}');
{code}

And get exception:
{code}
Error running child
org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: null of last_end_points_24h of 
last_end_points_24h
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
        at 
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
        at 
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:146)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:241)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NullPointerException: null of last_end_points_24h of 
last_end_points_24h
        at 
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.npe(PigAvroDatumWriter.java:323)
        at 
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:102)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
        ... 19 more
Caused by: java.lang.NullPointerException
        at 
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.getField(PigAvroDatumWriter.java:385)
        at 
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:363)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
        at 
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
        ... 21 more
{code}

I suppose that AvroStorage should correctly handle null tuples of relations 
consisting of null. Am I right?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to