Laxmikanth Samudrala created CRUNCH-338:
-------------------------------------------

             Summary: TupleDeepCopier throws java.lang.ClassCastException: 
java.util.ArrayList cannot be cast to org.apache.avro.generic.IndexedRecord
                 Key: CRUNCH-338
                 URL: https://issues.apache.org/jira/browse/CRUNCH-338
             Project: Crunch
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.8.2
            Reporter: Laxmikanth Samudrala
            Assignee: Josh Wills


when PTable<String, TupleN> using twice and performing parallelDo causing 
java.lang.ClassCastException; when the same PTable<String, TupleN> used once 
for parallelDo not causing the exception or turning PTable<String,TupleN> to 
PTable<String, Pair<Tuple4.Collect<?, ?, ?, ?>, Tuple3.Collect<?, ?, ?>>>  and 
using paired PTable twice for parallelDo not causing any exception. 

Note : The root cause seem's expressing the items passed to the TupleN is 
collection or a single instance. Suprisingly when we are performing parallelDo 
operation once on PTable<String, TupleN> is working with no exceptions and when 
performing parallelDo twice seem's try to make use of TupleDeepCopier.deepCopy; 
which is triggering the exception.

Template of Code :

Failure case :

PTable<String, TupleN> entityData = .....;
entityData.parallelDo(.....);
entityData.parallelDo(.....);


Success Case :

PTable<String, TupleN> entityData = .....;
entityData.parallelDo(.....);

Another success case :

PTable<String, Pair<Tuple4.Collect<?, ?, ?, ?>, Tuple3.Collect<?, ?, ?>>> 
entityData;
entityData.parallelDo(.....);
entityData.parallelDo(.....);

stack trace for reference :

org.apache.crunch.CrunchRuntimeException: Error while deep copying avro value 
[{"empiId": "0ad4f0a3-5d98-408a-ad8d-2341c489ef17", "rawMaraRiskScores": [], 
"normalizedMaraRiskScores": [{"modelName": "CXCONLAG0", "totalScore": 10.0, 
"procedureScore": 0.0, "rxScore": 0.0, "inpatientScore": 0.0, 
"outpatientScore": 0.0, "physicianScore": 0.0, "exposureMonths": 0}, 
{"modelName": "CXCONLAG1", "totalScore": 20.0, "procedureScore": 0.0, 
"rxScore": 0.0, "inpatientScore": 0.0, "outpatientScore": 0.0, 
"physicianScore": 0.0, "exposureMonths": 0}, {"modelName": "CXCONLAG2", 
"totalScore": 30.0, "procedureScore": 0.0, "rxScore": 0.0, "inpatientScore": 
0.0, "outpatientScore": 0.0, "physicianScore": 0.0, "exposureMonths": 0}]}]
        at 
org.apache.crunch.types.avro.AvroDeepCopier.deepCopy(AvroDeepCopier.java:195)
        at 
org.apache.crunch.types.avro.AvroDeepCopier$AvroSpecificDeepCopier.deepCopy(AvroDeepCopier.java:83)
        at 
org.apache.crunch.types.avro.AvroType.getDetachedValue(AvroType.java:217)
        at 
org.apache.crunch.types.TupleDeepCopier.deepCopy(TupleDeepCopier.java:60)
        at 
org.apache.crunch.types.TupleDeepCopier.deepCopy(TupleDeepCopier.java:32)
        at 
org.apache.crunch.types.avro.AvroType.getDetachedValue(AvroType.java:217)
        at org.apache.crunch.lib.PTables.getDetachedValue(PTables.java:191)
        at 
org.apache.crunch.types.avro.AvroTableType.getDetachedValue(AvroTableType.java:149)
        at 
org.apache.crunch.types.avro.AvroTableType.getDetachedValue(AvroTableType.java:36)
        at 
org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:54)
        at org.apache.crunch.MapFn.process(MapFn.java:34)
        at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
        at 
org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
        at org.apache.crunch.MapFn.process(MapFn.java:34)
        at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
        at org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:114)
        at 
org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:57)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:447)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to