[
https://issues.apache.org/jira/browse/CRUNCH-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891900#comment-13891900
]
Gabriel Reid commented on CRUNCH-338:
-------------------------------------
Thanks for the stack trace. I'm still having a hard time reproducing this issue
in my own code, but I do have one thing that would be good to check off (that a
colleague here encountered recently).
The constructor (and static factory methods) for TupleN take a varargs
parameter, and the issue that I saw someone else have with this recently was
that they were passing a List to TupleN.of() instead of an array. If something
is done with the TupleN before it needs to get serialized, this won't throw an
exception, and it's one of the places where the compiler won't catch anything
either. Having two branches coming off of the same PTable causes the deep
copier to be used (which implicitly uses serialization to perform deep copies),
and so it could cause this error to be thrown.
I'm able to replicate the same stack trace by "forgetting" to pass an array
(instead of list) of elements when constructing a TupleN, so I'd like to rule
that option out. Can you double-check how the TupleN is being constructed in
your case? And if that looks fine, could you try posting a little example that
demonstrates the error? I'll post the example code that I've put together to
try to replicate this issue.
> TupleDeepCopier throws java.lang.ClassCastException: java.util.ArrayList
> cannot be cast to org.apache.avro.generic.IndexedRecord
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: CRUNCH-338
> URL: https://issues.apache.org/jira/browse/CRUNCH-338
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.2
> Reporter: Laxmikanth Samudrala
> Assignee: Josh Wills
> Attachments: ClassCastExceptionInDeepCopierIT.java, stack-trace.log
>
>
> when PTable<String, TupleN> using twice and performing parallelDo causing
> java.lang.ClassCastException; when the same PTable<String, TupleN> used once
> for parallelDo not causing the exception or turning PTable<String,TupleN> to
> PTable<String, Pair<Tuple4.Collect<?, ?, ?, ?>, Tuple3.Collect<?, ?, ?>>>
> and using paired PTable twice for parallelDo not causing any exception.
> Note : The root cause seem's expressing the items passed to the TupleN is
> collection or a single instance. Suprisingly when we are performing
> parallelDo operation once on PTable<String, TupleN> is working with no
> exceptions and when performing parallelDo twice seem's try to make use of
> TupleDeepCopier.deepCopy; which is triggering the exception.
> Template of Code :
> Failure case :
> PTable<String, TupleN> entityData = .....;
> entityData.parallelDo(.....);
> entityData.parallelDo(.....);
> Success Case :
> PTable<String, TupleN> entityData = .....;
> entityData.parallelDo(.....);
> Another success case :
> PTable<String, Pair<Tuple4.Collect<?, ?, ?, ?>, Tuple3.Collect<?, ?, ?>>>
> entityData;
> entityData.parallelDo(.....);
> entityData.parallelDo(.....);
> stack trace for reference :
> org.apache.crunch.CrunchRuntimeException: Error while deep copying avro value
> [.........]
> at
> org.apache.crunch.types.avro.AvroDeepCopier.deepCopy(AvroDeepCopier.java:195)
> at
> org.apache.crunch.types.avro.AvroDeepCopier$AvroSpecificDeepCopier.deepCopy(AvroDeepCopier.java:83)
> at
> org.apache.crunch.types.avro.AvroType.getDetachedValue(AvroType.java:217)
> at
> org.apache.crunch.types.TupleDeepCopier.deepCopy(TupleDeepCopier.java:60)
> at
> org.apache.crunch.types.TupleDeepCopier.deepCopy(TupleDeepCopier.java:32)
> at
> org.apache.crunch.types.avro.AvroType.getDetachedValue(AvroType.java:217)
> at org.apache.crunch.lib.PTables.getDetachedValue(PTables.java:191)
> at
> org.apache.crunch.types.avro.AvroTableType.getDetachedValue(AvroTableType.java:149)
> at
> org.apache.crunch.types.avro.AvroTableType.getDetachedValue(AvroTableType.java:36)
> at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:54)
> at org.apache.crunch.MapFn.process(MapFn.java:34)
> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
> at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
> at org.apache.crunch.MapFn.process(MapFn.java:34)
> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
> at org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:114)
> at
> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:57)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:447)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)