Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchGotchas" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchGotchas?action=diff&rev1=4&rev2=5 Comment: more gotchas == Current Gotchas and using them: == - '''No agents listed in 'http.agent.name' property.''': + '''No agents listed in 'http.agent.name' property''': Since 1.3 Nutch is called from either of the runtime dirs (runtime/local and runtime/deploy). The conf files should be modified in runtime/local/conf, not in $NUTCH_HOME/conf. @@ -37, +37 @@ * During the crawl command, as explained [[http://wiki.apache.org/nutch/RunningNutchAndSolr#A3._Crawl_your_first_website|here]]. * or during the later stage of sending an individual solrindex command to Solr as explained [[http://wiki.apache.org/nutch/RunningNutchAndSolr#A6._Integrate_Solr_with_Nutch|here]]. + '''DiskErrorException while fetching''': + + Questions like this one arise fairly regularly on the user@ list + + {{{ + Hello, + + I am getting some exception while fetching: + + 2011-07-10 23:25:21,427 WARN mapred.LocalJobRunner - job_local_0001 + org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find + taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out + in any of the configured local directories + at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389) + at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) + at org.apache.hadoop.mapred.MapOutputFile.getSpillFile(MapOutputFile.java:94) + at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1443) + at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154) + at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359) + at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) + at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) + 2011-07-10 23:25:22,279 FATAL fetcher.Fetcher - Fetcher: + java.io.IOException: Job failed! + at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) + at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1107) + at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1145) + at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) + at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1116) + + What should I do? What happens if I restart the fetch job? + }}} + + The answer we find addressed the situation is that you're most likely out of disk space in /tmp. Consider + using another location, or possibly another partition for hadoop.tmp.dir (which can be set in nutch-site.xml) with plenty of room for large transient files or using a Hadoop cluster. +

