I'm still working making the latest in master work with the CDH4.1.x assemblies and am getting the following test failure:
844 [Thread-4] INFO org.apache.crunch.impl.mr.exec.CrunchJob - Running job "org.apache.crunch.lib.join.MultiAvroSchemaJoinIT: [[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/employee8500637597234937594.avro)+S1+joinTagRight]/[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/person7206654078561562436.avro)+S0+joinTagLeft]]+GBK+innerJoinGBK+PTables.values+Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)" 844 [Thread-4] INFO org.apache.crunch.impl.mr.exec.CrunchJob - Job status available at: http://localhost:8080/ log4j:WARN No appenders could be found for logger (mapreduce.Counters). log4j:WARN Please initialize the log4j system properly. 1326 [Thread-25] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 org.apache.avro.AvroTypeException: Found { "type" : "record", "name" : "Employee", "namespace" : "org.apache.crunch.test", "fields" : [ { "name" : "name", "type" : [ "string", "null" ] }, { "name" : "salary", "type" : "int" }, { "name" : "department", "type" : [ "string", "null" ] } ] }, expecting { "type" : "record", "name" : "Person", "namespace" : "org.apache.crunch.test", "fields" : [ { "name" : "name", "type" : [ "string", "null" ] }, { "name" : "age", "type" : "int" }, { "name" : "siblingnames", "type" : { "type" : "array", "items" : "string" } } ] } at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:231) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:169) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.crunch.types.avro.AvroRecordReader.nextKeyValue(AvroRecordReader.java:83) at org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:72) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:458) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:263) 1 job failure(s) occurred: org.apache.crunch.lib.join.MultiAvroSchemaJoinIT: [[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/employee8500637597234937594.avro)+S1+joinTagRight]/[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/person7206654078561562436.avro)+S0+joinTagLeft]]+GBK+innerJoinGBK+PTables.values+Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)(class org.apache.crunch.lib.join.MultiAvroSchemaJoinIT0): Job failed! 2194 [main] ERROR org.apache.crunch.materialize.MaterializableIterable - Could not materialize: Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1) java.io.IOException: No files found to materialize at: /var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1 at org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:48) at org.apache.crunch.io.avro.AvroFileSource.read(AvroFileSource.java:56) at org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:36) at org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:67) at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:158) at org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:59) at com.google.common.collect.Lists.newArrayList(Lists.java:119) at org.apache.crunch.lib.join.MultiAvroSchemaJoinIT.testJoin(MultiAvroSchemaJoinIT.java:117) Since I can view the created files which are used for inputs for the tests, the AvroException seems to indicate it having trouble reading a different file which I assume is the intermediate state created by the join or materialize. The issue is I can't really view the file because they are cleaned up when the test completes. I've added a Thread.sleep after the exception is throw to delay it completing and cleaning up. The oddity is that when I look at the directory it is attempting to materialize while it is sleeping the directory doesn't exist however the following one does: /var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p2 The only difference being the ending directory which contains a MAP, REDUCE, and output/. Reading the name of the MR job it would seem to indicate that the values should be getting pulled from the p1 directory. So how can I figure out what is getting written to p2? Regarding what I'm seeing with the different directory, is there a bug which is causing the anticipated directory mismatch (p1 vs p2)? Is there another way to store off the intermediate state throughout the processing for debug?
