[ 
https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658680#comment-13658680
 ] 

Viraj Bhat commented on PIG-3320:
---------------------------------

Hi all, 
What I found out that is that if you supply a user defined schema that is 
different from the schema which the actual data contains; there is no 
reconciliation that happens. In fact we have to reconcile it case by case basis 
by using the same logic which multiple_schemas is using.

By changing a part of the source code to read the user defined schema, it 
throws the following error. I think this is valid considering that previously 
the script was passing and returning results with no extra column.

java.lang.Exception: java.io.IOException: org.apache.avro.AvroTypeException: 
Found {
  "type" : "record",
  "name" : "schema_writing",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "int" ]
  }, {
    "name" : "intnum5",
    "type" : [ "null", "int" ]
  } ]
}, expecting {
  "type" : "record",
  "name" : "schema_reading",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "int" ]
  }, {
    "name" : "intnum5",
    "type" : [ "null", "string" ]
  }, {
    "name" : "intnum100",
    "type" : [ "null", "int" ]
  } ]
}
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:399)
Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found {
  "type" : "record",
  "name" : "schema_writing",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "int" ]
  }, {
    "name" : "intnum5",
    "type" : [ "null", "int" ]
  } ]
}, expecting {
  "type" : "record",
  "name" : "schema_reading",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "int" ]
  }, {
    "name" : "intnum5",
    "type" : [ "null", "string" ]
  }, {
    "name" : "intnum100",
    "type" : [ "null", "int" ]
  } ]
}
        at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:370)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:497)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)

Regards
Viraj


                
> AVRO: no empty field expressed when loading with AvroStorage using reader 
> schema with extra field that has no default
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3320
>                 URL: https://issues.apache.org/jira/browse/PIG-3320
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.11.2
>            Reporter: Egil Sorensen
>            Assignee: Viraj Bhat
>              Labels: patch
>             Fix For: 0.12, 0.11.2
>
>
> Somewhat different use case than PIG-3318:
> Loading with AvroStorage giving a loader schema that relative to the schema 
> in the Avro file had an extra filed w/o default and expected to see an extra 
> empty column, but the schema is as in the avro file w/o the extra column.
> E.g. see the e2e style test, which fails on this:
> {code}
>                         {
>                         'num' => 2,
>                         # storing using writer schema
>                         # loading using reader schema with extra field that 
> has no default
>                         'notmq' => 1,
>                         'pig' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> -- Store Avro file w. schema
> b1 = foreach a generate id, intnum5;
> c1 = filter b1 by 10 <= id and id < 20;
> describe c1;
> dump c1;
> store c1 into ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>    "schema" : {  
>       "name" : "schema_writing",
>       "type" : "record",
>       "fields" : [
>          {  
>             "name" : "id",
>             "type" : [
>                "null",
>                "int"
>             ]
>          },
>          {  
>             "name" : "intnum5",
>             "type" : [
>                "null",
>                "int"
>             ]
>          }
>       ]
>    }
> }
> ');
> exec;
> -- Read back what was stored with Avro adding extra field to reader schema
> u = load ':OUTPATH:.intermediate_1' USING 
> org.apache.pig.piggybank.storage.avro.AvroStorage('
> {
>    "debug" : 5,
>    "schema" : {  
>       "name" : "schema_reading",
>       "type" : "record",
>       "fields" : [
>          {  
>             "name" : "id",
>             "type" : [
>                "null",
>                "int"
>             ]
>          },
>          {  
>             "name" : "intnum5",
>             "type" : [
>                "null",
>                "string"
>             ]
>          },
>          {
>             "name" : "intnum100",
>             "type" : [
>                "null",
>                "int"
>             ]
>          }
>       ]
>    }
> }
> ');
> describe u;
> dump u;
> store u into ':OUTPATH:';
> \,
>                         'verify_pig_script' => q\
> a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: 
> int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
> float,doublenum: double);
> b = filter a by (10 <= id and id < 20);
> c = foreach b generate id, intnum5, '';
> store c into ':OUTPATH:';
> \,
>                         },
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to