Re: Error with Hadoop-0.4.0

Stefan Groschupf Fri, 07 Jul 2006 08:15:30 -0700

Hi Jérôme,

I have the same problem on a distribute environment! :-(
So I think can confirm this is a bug.
We should fix that.


Stefan

On 06.07.2006, at 08:54, Jérôme Charron wrote:

Hi,

I encountered some problems with Nutch trunk version.
In fact it seems to be related to changes related to Hadoop-0.4.0and JDK
1.5
(more precisely since HADOOP-129 and File replacement by Path).
In my environment, the crawl command terminate with the followingerror:2006-07-06 17:41:49,735 ERROR mapred.JobClient(JobClient.java:submitJob(273))- Input directory /localpath/crawl/crawldb/current in local isinvalid.
Exception in thread "main" java.io.IOException: Input directory
/localpathcrawl/crawldb/current in local is invalid.
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
       at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
       at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
By looking at the Nutch code, and simply changing the line 145 ofInjector
by mergeJob.setInputPath(tempDir) (instead of mergeJob.addInputPath
(tempDir))
all is working fine. By taking a closer look at CrawlDb code, Ifinaly don"t
understand why there is the following line in the createJob method:
job.addInputPath(new Path(crawlDb, CrawlDatum.DB_DIR_NAME));

For curiosity, if a hadoop guru can explain why there is such a
regression...

Does somebody have the same error?

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: Error with Hadoop-0.4.0

Reply via email to