Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "ErrorMessagesInNutch2" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/ErrorMessagesInNutch2?action=diff&rev1=3&rev2=4

  
  == Missing plugins whilst running Nutch 2.0 on Cloudera's CDH3 ==
  
- Cloudera's CDH3 is Cloudera's distribution including Apache Hadoop. More 
information can be found 
[[https://ccp.cloudera.com/display/CDHDOC/CDH3+Quick+Start+Guide|here]]. This 
common error results due to a bug in MAPREDUCE-967 which modifies the way 
MapReduce unpacks the job's jar. The old way was to unpack the whole of it, now 
only classes/ and lib/ are unpacked. This way nutch is missing the plugins/ 
directory. A workaround is to force unpacking of the plugin/ directory. This 
can be done by adding the following properties to nutch-site.xml
+ Cloudera's CDH3 is Cloudera's distribution including Apache Hadoop. More 
information can be found 
[[https://ccp.cloudera.com/display/CDHDOC/CDH3+Quick+Start+Guide|here]]. This 
common error results due to a bug in MAPREDUCE-967 which modifies the way 
MapReduce unpacks the job's jar. The old way was to unpack the whole of it, now 
only classes/ and lib/ are unpacked. This way Nutch is missing the plugins/ 
directory. A workaround is to force unpacking of the plugin/ directory. This 
can be done by adding the following properties to nutch-site.xml
  {{{
  <property>
  <name>mapreduce.job.jar.unpack.pattern</name>
@@ -76, +76 @@

  <value>${job.local.dir}/../jars/plugins</value>
  </property>
  }}}
- It is then necessary to recreate the Nutch job file using ant.
+ and by removing hue-plugins-1.2.0-cdh3u1.jar from the hadoop lib folder (e.g. 
/usr/lib/hadoop-0.20/lib).
  
+ It is then necessary to recreate the Nutch job file using ant. Then finally 
it is important to set HADOOP_OPTS="-Djob.local.dir=/<MY HOME>/nutch/plugins" 
in hadoop-env.sh. 
+ 
+ Although this is a real nasty workaround it does work.
+ 

Reply via email to