[
https://issues.apache.org/jira/browse/PIG-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660959#comment-13660959
]
Viraj Bhat commented on PIG-3323:
---------------------------------
Hi Egil,
I looked at the specification of the UNION, Default types and the source code
in: "PigAvroDatumWriter"
Field: "intum100" is a UNION of "null" and "int". So the type can be a "null"
or an "int"
That means if Pig does not find a value for "intnum100" in the previous step
before the store it will generate null which is perfectly acceptable here. So
the default value makes no sense here if the item does not exist.
Also if you remove "null" from the specification of "intnumm100" and hope the
default value is written out, there is another problem:
If you read specification for Unions
http://avro.apache.org/docs/current/spec.html#Unions plus
Section on Default Values
http://avro.apache.org/docs/current/spec.html#schema_complex
Union does not have any default values in the specification.
Closing a INVAILD
Regards
Viraj
> AVRO: default value not stored in file when given as paramter to AvroStorage
> ----------------------------------------------------------------------------
>
> Key: PIG-3323
> URL: https://issues.apache.org/jira/browse/PIG-3323
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.11.2
> Reporter: Egil Sorensen
> Assignee: Viraj Bhat
> Labels: patch
> Fix For: 0.12, 0.11.2
>
>
> A pig script like the below succeeds, but inspecting the resulting file I
> find that the schema is stripped of the default value specification.
> {code}
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000:
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum:
> float,doublenum: double);
> b2 = foreach a generate id, intnum5, intnum100;
> c2 = filter b2 by 110 <= id and id < 120;
> describe c2;
> dump c2;
> store c2 into ':OUTPATH:.intermediate_2' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
> "debug" : 5,
> "schema" : {
> "name" : "schema_2",
> "type" : "record",
> "fields" : [
> {
> "name" : "id",
> "type" : [
> "null",
> "int"
> ]
> },
> {
> "name" : "intnum5",
> "type" : [
> "null",
> "int"
> ]
> },
> {
> "name" : "intnum100",
> "type" : [
> "null",
> "int"
> ],
> "default" : 0
> }
> ]
> }
> }
> ');
> {code}
> BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is
> mute on the subject of defaults, so first question is: is my expectation that
> the default is to be written to file not correct?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira