[ 
https://issues.apache.org/jira/browse/PIG-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683714#comment-13683714
 ] 

Jeremy Karn commented on PIG-3355:
----------------------------------

I should also mention that this bug manifests itself in a couple of different 
ways.  The job generally crashes at some point
where the schema doesn't match the data tuple.  The most common exceptions 
we've seen are like:


java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:159)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:341)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:264)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:416)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)



2013-06-13 15:28:14,188   java.io.IOException: Type mismatch in key from map: 
expected org.apache.pig.impl.io.NullableText, recieved 
org.apache.pig.impl.io.NullableBytesWritable
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:127)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:273)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
                
> ColumnMapKeyPrune bug with distinct operator
> --------------------------------------------
>
>                 Key: PIG-3355
>                 URL: https://issues.apache.org/jira/browse/PIG-3355
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10.1, 0.11.1
>            Reporter: Jeremy Karn
>         Attachments: PIG-3355.patch
>
>
> We came across a bug that happens when you have a distinct operator 
> immediately followed by a union where the result of the union has at least 
> one column that will be pruned by ColumnMapKeyPrune.  There's a test showing 
> an example script in the submitted patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to