[
https://issues.apache.org/jira/browse/PIG-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey updated PIG-3408:
------------------------
Description:
Hi, I have a jython UDF with schema
{code}
@outputSchema("splitted_pivots:tuple(route_pivots:bag{tuple()},
last_event:bag{tuple()})")
def split_last_end_points(bag_with_pivots, startOfHour, tuple_schema_as_str):
#some code goes here
return current_hour_pivots, [last_event_pivot]
{code}
last_event_pivot should contain one tuple. By default it's None.
It's normal case for the udf to return last_event_pivot=None
Then I try to store this value:
{code}
lastEvents24FromCurrHour = FOREACH pivotsWithEndPoints generate FLATTEN
(splitted_pivots.last_event) as (msisdn: long, more_fields)
--Stupid hack split_last_end_points can return null for
--lastEvents24FromCurrHourFiltered = FILTER lastEvents24FromCurrHour by
is_end_point is not null and end_point_type is not null;
STORE lastEvents24FromCurrHour INTO '$lastEndPoints24hOut'
USING
org.apache.pig.piggybank.storage.avro.AvroStorage('index', '4', 'schema',
'{"name": "last_end_points_24h", "doc": "version 0.0.1", "type": "record",
"fields": [
{"name": "msisdn", "type": "long"},
{"name": "more_fields", "type": "int"}
]}');
{code}
And get exception:
{code}
Error running child
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of last_end_points_24h of
last_end_points_24h
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:146)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:241)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NullPointerException: null of last_end_points_24h of
last_end_points_24h
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.npe(PigAvroDatumWriter.java:323)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:102)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
... 19 more
Caused by: java.lang.NullPointerException
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.getField(PigAvroDatumWriter.java:385)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:363)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
... 21 more
{code}
I suppose that AvroStorage should correctly handle null tuples of relations
consisting of null. Am I right?
was:
Hi, I have a jython UDF with schema
{code}
@outputSchema("splitted_pivots:tuple(route_pivots:bag{tuple()},
last_event:bag{tuple()})")
def split_last_end_points(bag_with_pivots, startOfHour, tuple_schema_as_str):
#some code goes here
return current_hour_pivots, [last_event_pivot]
{code}
last_event_pivot should contain one tuple. By default it's None.
It's normal case for the udf to return last_event_pivot=None
Then I try to store this value:
{code}
lastEvents24FromCurrHour = FOREACH pivotsWithEndPoints generate FLATTEN
(splitted_pivots.last_event) as (msisdn: long, more_fields)
--Stupid hack split_last_end_points can return null for
--lastEvents24FromCurrHourFiltered = FILTER lastEvents24FromCurrHour by
is_end_point is not null and end_point_type is not null;
STORE lastEvents24FromCurrHourFiltered INTO '$lastEndPoints24hOut'
USING
org.apache.pig.piggybank.storage.avro.AvroStorage('index', '4', 'schema',
'{"name": "last_end_points_24h", "doc": "version 0.0.1", "type": "record",
"fields": [
{"name": "msisdn", "type": "long"},
{"name": "more_fields", "type": "int"}
]}');
{code}
And get exception:
{code}
Error running child
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of last_end_points_24h of
last_end_points_24h
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:146)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:241)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NullPointerException: null of last_end_points_24h of
last_end_points_24h
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.npe(PigAvroDatumWriter.java:323)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:102)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
... 19 more
Caused by: java.lang.NullPointerException
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.getField(PigAvroDatumWriter.java:385)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:363)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
... 21 more
{code}
I suppose that AvroStorage should correctly handle null tuples of relations
consisting of null. Am I right?
> AvroStorage fails to save relation with single null tuple
> ---------------------------------------------------------
>
> Key: PIG-3408
> URL: https://issues.apache.org/jira/browse/PIG-3408
> Project: Pig
> Issue Type: Bug
> Components: data
> Affects Versions: 0.11
> Environment: cluster, local
> Reporter: Sergey
>
> Hi, I have a jython UDF with schema
> {code}
> @outputSchema("splitted_pivots:tuple(route_pivots:bag{tuple()},
> last_event:bag{tuple()})")
> def split_last_end_points(bag_with_pivots, startOfHour, tuple_schema_as_str):
> #some code goes here
> return current_hour_pivots, [last_event_pivot]
> {code}
> last_event_pivot should contain one tuple. By default it's None.
> It's normal case for the udf to return last_event_pivot=None
> Then I try to store this value:
> {code}
> lastEvents24FromCurrHour = FOREACH pivotsWithEndPoints generate FLATTEN
> (splitted_pivots.last_event) as (msisdn: long, more_fields)
> --Stupid hack split_last_end_points can return null for
> --lastEvents24FromCurrHourFiltered = FILTER lastEvents24FromCurrHour by
> is_end_point is not null and end_point_type is not null;
> STORE lastEvents24FromCurrHour INTO '$lastEndPoints24hOut'
> USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('index', '4', 'schema',
> '{"name": "last_end_points_24h", "doc": "version 0.0.1", "type": "record",
> "fields": [
> {"name": "msisdn", "type": "long"},
> {"name": "more_fields", "type": "int"}
> ]}');
> {code}
> And get exception:
> {code}
> Error running child
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.NullPointerException: null of last_end_points_24h of
> last_end_points_24h
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:146)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:241)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.NullPointerException: null of last_end_points_24h of
> last_end_points_24h
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.npe(PigAvroDatumWriter.java:323)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:102)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
> ... 19 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.getField(PigAvroDatumWriter.java:385)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:363)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
> ... 21 more
> {code}
> I suppose that AvroStorage should correctly handle null tuples of relations
> consisting of null. Am I right?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira