[
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Egil Sorensen updated PIG-3322:
-------------------------------
Description:
I am getting NPE when loading a file with AvroStorage a file that has schema
like:
{code}
["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
from Pig Field
Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig
Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated from
Pig Field Schema"}]}]
{code}
E.g. see the e2e style test, which fails on this:
{code}
{
'num' => 4,
# storing file with Pig type tuple relying on
conversion to record
# loading using stored schemas
'notmq' => 1,
'pig' => q\
a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as
(m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int,
gpa:double)});
b = foreach a generate t;
describe b;
store b into ':OUTPATH:.intermediate' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
exec;
-- Read back what was stored with Avro
u = load ':OUTPATH:.intermediate' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
describe u;
store u into ':OUTPATH:';
\,
'verify_pig_script' => q\
a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as
(m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int,
gpa:double)});
b = foreach a generate t;
describe b;
store b into ':OUTPATH:';
\,
},
{code}
was:
Somewhat different use case than PIG-3318:
Loading with AvroStorage giving a loader schema that relative to the schema in
the Avro file had an extra filed w/o default and expected to see an extra empty
column, but the schema is as in the avro file w/o the extra column.
E.g. see the e2e style test, which fails on this:
{code}
{
'num' => 2,
# storing using writer schema
# loading using reader schema with extra field that has
no default
'notmq' => 1,
'pig' => q\
a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000:
int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum:
float,doublenum: double);
-- Store Avro file w. schema
b1 = foreach a generate id, intnum5;
c1 = filter b1 by 10 <= id and id < 20;
describe c1;
dump c1;
store c1 into ':OUTPATH:.intermediate_1' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('
{
"schema" : {
"name" : "schema_writing",
"type" : "record",
"fields" : [
{
"name" : "id",
"type" : [
"null",
"int"
]
},
{
"name" : "intnum5",
"type" : [
"null",
"int"
]
}
]
}
}
');
exec;
-- Read back what was stored with Avro adding extra field to reader schema
u = load ':OUTPATH:.intermediate_1' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('
{
"debug" : 5,
"schema" : {
"name" : "schema_reading",
"type" : "record",
"fields" : [
{
"name" : "id",
"type" : [
"null",
"int"
]
},
{
"name" : "intnum5",
"type" : [
"null",
"string"
]
},
{
"name" : "intnum100",
"type" : [
"null",
"int"
]
}
]
}
}
');
describe u;
dump u;
store u into ':OUTPATH:';
\,
'verify_pig_script' => q\
a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000:
int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum:
float,doublenum: double);
b = filter a by (10 <= id and id < 20);
c = foreach b generate id, intnum5, '';
store c into ':OUTPATH:';
\,
},
{code}
> AVRO: AvroStorage give NPE on reading file with union as top level schema
> -------------------------------------------------------------------------
>
> Key: PIG-3322
> URL: https://issues.apache.org/jira/browse/PIG-3322
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.11.2
> Reporter: Egil Sorensen
> Assignee: Viraj Bhat
> Labels: patch
> Fix For: 0.12, 0.11.2
>
>
> I am getting NPE when loading a file with AvroStorage a file that has schema
> like:
> {code}
> ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
> from Pig Field
> Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig
> Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated
> from Pig Field Schema"}]}]
> {code}
> E.g. see the e2e style test, which fails on this:
> {code}
> {
> 'num' => 4,
> # storing file with Pig type tuple relying on
> conversion to record
> # loading using stored schemas
> 'notmq' => 1,
> 'pig' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray,
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:.intermediate' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> exec;
> -- Read back what was stored with Avro
> u = load ':OUTPATH:.intermediate' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe u;
> store u into ':OUTPATH:';
> \,
> 'verify_pig_script' => q\
> a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as
> (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray,
> age:int, gpa:double)});
> b = foreach a generate t;
> describe b;
> store b into ':OUTPATH:';
> \,
> },
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira