I'm still working making the latest in master work with the CDH4.1.x
assemblies and am getting the following test failure:

844  [Thread-4] INFO  org.apache.crunch.impl.mr.exec.CrunchJob  -
Running job "org.apache.crunch.lib.join.MultiAvroSchemaJoinIT:
[[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/employee8500637597234937594.avro)+S1+joinTagRight]/[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/person7206654078561562436.avro)+S0+joinTagLeft]]+GBK+innerJoinGBK+PTables.values+Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)"
844  [Thread-4] INFO  org.apache.crunch.impl.mr.exec.CrunchJob  - Job
status available at: http://localhost:8080/
log4j:WARN No appenders could be found for logger (mapreduce.Counters).
log4j:WARN Please initialize the log4j system properly.
1326 [Thread-25] WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local_0001
org.apache.avro.AvroTypeException: Found {
  "type" : "record",
  "name" : "Employee",
  "namespace" : "org.apache.crunch.test",
  "fields" : [ {
    "name" : "name",
    "type" : [ "string", "null" ]
  }, {
    "name" : "salary",
    "type" : "int"
  }, {
    "name" : "department",
    "type" : [ "string", "null" ]
  } ]
}, expecting {
  "type" : "record",
  "name" : "Person",
  "namespace" : "org.apache.crunch.test",
  "fields" : [ {
    "name" : "name",
    "type" : [ "string", "null" ]
  }, {
    "name" : "age",
    "type" : "int"
  }, {
    "name" : "siblingnames",
    "type" : {
      "type" : "array",
      "items" : "string"
    }
  } ]
}
        at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:231)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
        at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:169)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at 
org.apache.crunch.types.avro.AvroRecordReader.nextKeyValue(AvroRecordReader.java:83)
        at 
org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:72)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:458)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:263)
1 job failure(s) occurred:
org.apache.crunch.lib.join.MultiAvroSchemaJoinIT:
[[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/employee8500637597234937594.avro)+S1+joinTagRight]/[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/person7206654078561562436.avro)+S0+joinTagLeft]]+GBK+innerJoinGBK+PTables.values+Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)(class
org.apache.crunch.lib.join.MultiAvroSchemaJoinIT0): Job failed!
2194 [main] ERROR org.apache.crunch.materialize.MaterializableIterable
 - Could not materialize:
Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)
java.io.IOException: No files found to materialize at:
/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1
        at 
org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:48)
        at org.apache.crunch.io.avro.AvroFileSource.read(AvroFileSource.java:56)
        at 
org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:36)
        at 
org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:67)
        at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:158)
        at 
org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:59)
        at com.google.common.collect.Lists.newArrayList(Lists.java:119)
        at 
org.apache.crunch.lib.join.MultiAvroSchemaJoinIT.testJoin(MultiAvroSchemaJoinIT.java:117)

Since I can view the created files which are used for inputs for the
tests, the AvroException seems to indicate it having trouble reading a
different file which I assume is the intermediate state created by the
join or materialize.  The issue is I can't really view the file
because they are cleaned up when the test completes.  I've added a
Thread.sleep after the exception is throw to delay it completing and
cleaning up.  The oddity is that when I look at the directory it is
attempting to materialize while it is sleeping the directory doesn't
exist however the following one does:

/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p2

The only difference being the ending directory which contains a MAP,
REDUCE, and output/.  Reading the name of the MR job it would seem to
indicate that the values should be getting pulled from the p1
directory.  So how can I figure out what is getting written to p2?
Regarding what I'm seeing with the different directory, is there a bug
which is causing the anticipated directory mismatch (p1 vs p2)?

Is there another way to store off the intermediate state throughout
the processing for debug?

Reply via email to