[ 
https://issues.apache.org/jira/browse/PIG-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Egil Sorensen updated PIG-3323:
-------------------------------

    Description: 
A pig script like the below succeeds, but inspecting the resulting file I find 
that the schema is stripped of the default value specification.

{code}
a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
float,doublenum: double);

b2 = foreach a generate id, intnum5, intnum100;
c2 = filter b2 by 110 <= id and id < 120;
describe c2;
dump c2;
store c2 into ':OUTPATH:.intermediate_2' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage('
{
   "debug" : 5,
   "schema" : {  
      "name" : "schema_2",
      "type" : "record",
      "fields" : [
         {  
            "name" : "id",
            "type" : [
               "null",
               "int"
            ]
         },
         {  
            "name" : "intnum5",
            "type" : [
               "null",
               "int"
            ]
         },
         {
            "name" : "intnum100",
            "type" : [
               "null",
               "int"
            ],
            "default" : 0
         }
      ]
   }
}
');
{code}


BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is mute 
on the subject of defaults, so first question is: is my expectation that the 
default is to be written to file not correct?


  was:
I am getting NPE when loading a file with AvroStorage a file that has schema 
like:

{code}
["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated
 from Pig Field 
Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig 
Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated from 
Pig Field Schema"}]}]
{code}


E.g. see the e2e style test, which fails on this:

{code}
                        {
                        'num' => 4,
                        # storing file with Pig type tuple relying on 
conversion to record
                        # loading using stored schemas 
                        'notmq' => 1,
                        'pig' => q\
a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
(m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int, 
gpa:double)});
b = foreach a generate t;
describe b;
store b into ':OUTPATH:.intermediate' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();

exec;

-- Read back what was stored with Avro
u = load ':OUTPATH:.intermediate' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();
describe u;
store u into ':OUTPATH:';
\,
                        'verify_pig_script' => q\
a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
(m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int, 
gpa:double)});
b = foreach a generate t;
describe b;
store b into ':OUTPATH:';
\,
                        },
{code}





    
> AVRO: default value not stored in file when given as paramter to AvroStorage
> ----------------------------------------------------------------------------
>
>                 Key: PIG-3323
>                 URL: https://issues.apache.org/jira/browse/PIG-3323
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.11.2
>            Reporter: Egil Sorensen
>            Assignee: Viraj Bhat
>              Labels: patch
>             Fix For: 0.12, 0.11.2
>
>
> A pig script like the below succeeds, but inspecting the resulting file I 
> find that the schema is stripped of the default value specification.
> {code}
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> b2 = foreach a generate id, intnum5, intnum100;
> c2 = filter b2 by 110 <= id and id < 120;
> describe c2;
> dump c2;
> store c2 into ':OUTPATH:.intermediate_2' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>    "debug" : 5,
>    "schema" : {  
>       "name" : "schema_2",
>       "type" : "record",
>       "fields" : [
>          {  
>             "name" : "id",
>             "type" : [
>                "null",
>                "int"
>             ]
>          },
>          {  
>             "name" : "intnum5",
>             "type" : [
>                "null",
>                "int"
>             ]
>          },
>          {
>             "name" : "intnum100",
>             "type" : [
>                "null",
>                "int"
>             ],
>             "default" : 0
>          }
>       ]
>    }
> }
> ');
> {code}
> BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is 
> mute on the subject of defaults, so first question is: is my expectation that 
> the default is to be written to file not correct?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to