Viewing intermediate states for debugging

Micah Whitacre Wed, 23 Jan 2013 15:40:04 -0800

I'm still working making the latest in master work with the CDH4.1.x
assemblies and am getting the following test failure:


844  [Thread-4] INFO  org.apache.crunch.impl.mr.exec.CrunchJob  -
Running job "org.apache.crunch.lib.join.MultiAvroSchemaJoinIT:
[[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/employee8500637597234937594.avro)+S1+joinTagRight]/[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/person7206654078561562436.avro)+S0+joinTagLeft]]+GBK+innerJoinGBK+PTables.values+Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)"
844  [Thread-4] INFO  org.apache.crunch.impl.mr.exec.CrunchJob  - Job
status available at: http://localhost:8080/
log4j:WARN No appenders could be found for logger (mapreduce.Counters).
log4j:WARN Please initialize the log4j system properly.
1326 [Thread-25] WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local_0001
org.apache.avro.AvroTypeException: Found {
  "type" : "record",
  "name" : "Employee",
  "namespace" : "org.apache.crunch.test",
  "fields" : [ {
    "name" : "name",
    "type" : [ "string", "null" ]
  }, {
    "name" : "salary",
    "type" : "int"
  }, {
    "name" : "department",
    "type" : [ "string", "null" ]
  } ]
}, expecting {
  "type" : "record",
  "name" : "Person",
  "namespace" : "org.apache.crunch.test",
  "fields" : [ {
    "name" : "name",
    "type" : [ "string", "null" ]
  }, {
    "name" : "age",
    "type" : "int"
  }, {
    "name" : "siblingnames",
    "type" : {
      "type" : "array",
      "items" : "string"
    }
  } ]
}
        at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:231)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
        at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:169)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:135)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at 
org.apache.crunch.types.avro.AvroRecordReader.nextKeyValue(AvroRecordReader.java:83)
        at 
org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:72)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:458)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:263)
1 job failure(s) occurred:
org.apache.crunch.lib.join.MultiAvroSchemaJoinIT:
[[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/employee8500637597234937594.avro)+S1+joinTagRight]/[Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/person7206654078561562436.avro)+S0+joinTagLeft]]+GBK+innerJoinGBK+PTables.values+Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)(class
org.apache.crunch.lib.join.MultiAvroSchemaJoinIT0): Job failed!
2194 [main] ERROR org.apache.crunch.materialize.MaterializableIterable
 - Could not materialize:
Avro(/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1)
java.io.IOException: No files found to materialize at:
/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p1
        at 
org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:48)
        at org.apache.crunch.io.avro.AvroFileSource.read(AvroFileSource.java:56)
        at 
org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:36)
        at 
org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:67)
        at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:158)
        at 
org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:59)
        at com.google.common.collect.Lists.newArrayList(Lists.java:119)
        at 
org.apache.crunch.lib.join.MultiAvroSchemaJoinIT.testJoin(MultiAvroSchemaJoinIT.java:117)

Since I can view the created files which are used for inputs for the
tests, the AvroException seems to indicate it having trouble reading a
different file which I assume is the intermediate state created by the
join or materialize.  The issue is I can't really view the file
because they are cleaned up when the test completes.  I've added a
Thread.sleep after the exception is throw to delay it completing and
cleaning up.  The oddity is that when I look at the directory it is
attempting to materialize while it is sleeping the directory doesn't
exist however the following one does:

/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6053160028799274805/tmp-crunch.tmp.dir/crunch-2139922377/p2

The only difference being the ending directory which contains a MAP,
REDUCE, and output/.  Reading the name of the MR job it would seem to
indicate that the values should be getting pulled from the p1
directory.  So how can I figure out what is getting written to p2?
Regarding what I'm seeing with the different directory, is there a bug
which is causing the anticipated directory mismatch (p1 vs p2)?

Is there another way to store off the intermediate state throughout
the processing for debug?

Viewing intermediate states for debugging

Reply via email to