Hey Dave, Agree with your assessment, we should fail fast in this case. Adding a JIRA issue w/a patch for it.
J On Fri, Jun 28, 2013 at 8:07 AM, Dave Beech <[email protected]> wrote: > Hi all, > > Please take a look at the following pipeline: > > read(From.textFile(args[0])).write(To.textFile(args[1] + "-text")); > run(); > read(From.textFile(args[0])).write(To.sequenceFile(args[1] + "-seq")); > run(); > read(From.textFile(args[0])).write(To.avroFile(args[1] + "-avro")); > done(); > The first two jobs are fine, and give correct output types of text and > sequence files respectively. The text to avro conversion fails. This is no > great surprise, knowing a little about the internals of Crunch, but when > put alongside the other examples it feels like it should work. > > Even if it can't work - no big deal, it's just a toy example. The main > problem for me was the error message: > > 13/06/28 14:11:40 INFO jobcontrol.CrunchControlledJob: > org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. > at > > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) > > I think the job should have been killed somewhere before this point. There > must be a bit of logic (though I haven't properly looked for it) which > decides the requested target is no good for the PCollection provided, so > the exception should be raised there with a message explaining this. > > What do you think? > > I'm sure there's a JIRA ticket lurking somewhere in all this - I'm just not > sure what it is! :) > > Thanks, > Dave > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
