[Nutch Wiki] Update of "NutchGotchas" by LewisJohnMcgibbney

Apache Wiki Wed, 13 Jul 2011 02:06:52 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "NutchGotchas" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/NutchGotchas?action=diff&rev1=4&rev2=5

Comment:
more gotchas

  
  == Current Gotchas and using them: ==
  
- '''No agents listed in 'http.agent.name' property.''': 
+ '''No agents listed in 'http.agent.name' property''': 
  
  Since 1.3 Nutch is called from either of the runtime dirs (runtime/local and 
runtime/deploy). The conf files should be modified in runtime/local/conf, not 
in $NUTCH_HOME/conf.
  
@@ -37, +37 @@

   * During the crawl command, as explained 
[[http://wiki.apache.org/nutch/RunningNutchAndSolr#A3._Crawl_your_first_website|here]].
   * or during the later stage of sending an individual solrindex command to 
Solr as explained 
[[http://wiki.apache.org/nutch/RunningNutchAndSolr#A6._Integrate_Solr_with_Nutch|here]].
 
  
+ '''DiskErrorException while fetching''':
+ 
+ Questions like this one arise fairly regularly on the user@ list
+ 
+ {{{
+ Hello,
+ 
+ I am getting some exception while fetching:
+ 
+ 2011-07-10 23:25:21,427 WARN  mapred.LocalJobRunner - job_local_0001
+ org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
+ 
taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
+ in any of the configured local directories
+        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
+        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
+        at 
org.apache.hadoop.mapred.MapOutputFile.getSpillFile(MapOutputFile.java:94)
+        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1443)
+        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
+        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
+        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
+        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
+ 2011-07-10 23:25:22,279 FATAL fetcher.Fetcher - Fetcher:
+ java.io.IOException: Job failed!
+        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
+        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1107)
+        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1145)
+        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
+        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1116)
+ 
+ What should I do? What happens if I restart the fetch job?
+ }}}
+ 
+ The answer we find addressed the situation is that you're most likely out of 
disk space in /tmp. Consider
+ using another location, or possibly another partition for hadoop.tmp.dir 
(which can be set in nutch-site.xml) with plenty of room for large transient 
files or using a Hadoop cluster.
+

[Nutch Wiki] Update of "NutchGotchas" by LewisJohnMcgibbney

Reply via email to