Doug Cutting wrote:

 Jérôme Charron wrote:

> In my environment, the crawl command terminate with the following
> error: 2006-07-06 17:41:49,735 ERROR mapred.JobClient
> (JobClient.java:submitJob(273)) - Input directory
> /localpath/crawl/crawldb/current in local is invalid. Exception in
> thread "main" java.io.IOException: Input directory
> /localpathcrawl/crawldb/current in local is invalid. at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274) at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) at
> org.apache.nutch.crawl.Injector.inject(Injector.java:146) at
> org.apache.nutch.crawl.Crawl.main(Crawl.java:105)


 Hadoop 0.4.0 by default requires all input directories to exist,
 where previous releases did not. So we need to either create an
 empty "current" directory or change the InputFormat used in
 CrawlDb.createJob() to be one that overrides
 InputFormat.areValidInputDirectories(). The former is probably
 easier. I've attached a patch. Does this fix things for folks?


Patch works for me.
--
Sami Siren

Reply via email to