Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "ErrorMessagesInNutch2" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/ErrorMessagesInNutch2?action=diff&rev1=2&rev2=3

  
  When using different Gora storage mechanisms we have to manually tweak the 
Nutch Ivy configuration depending on the choice of Gora store, in this case 
Cassandra.
  
- This is what was added to ivy/ivy.xml:
+ To resolve this error the following was added to $NUTCH_HOME/ivy/ivy.xml:
  {{{
  <dependency org="org.apache.gora" name="gora-cassandra" rev="0.2-incubating" 
conf="*->compile"/>
  <dependency org="org.apache.cassandra" name="cassandra-thrift" rev="0.8.1"/>
@@ -51, +51 @@

  <dependency org="org.apache.cassandra" name="apache-cassandra" rev="0.8.1"/>
  <dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2"/>
  }}}
+ then the following ant commands were executed
+ {{{
+ $ ant clean
+ $ ant
+ }}}
+ This specified the correct dependencies to be downloaded by Ivy which were 
then bundled into the nutch-2.0-dev.job file.
  
  In this particular case it was mentioned that Cloudera CDH3 was being used. 
It has a hue plugins jar with an older thrift library in it, therefore removing 
this jar from the classpath resolved further errors with running Nutch in 
distributed mode.
  
  Correspondence on this error can be seen in context 
[[http://www.mail-archive.com/dev%40nutch.apache.org/msg03482.html|here]] 
  
+ == Missing plugins whilst running Nutch 2.0 on Cloudera's CDH3 ==
+ 
+ Cloudera's CDH3 is Cloudera's distribution including Apache Hadoop. More 
information can be found 
[[https://ccp.cloudera.com/display/CDHDOC/CDH3+Quick+Start+Guide|here]]. This 
common error results due to a bug in MAPREDUCE-967 which modifies the way 
MapReduce unpacks the job's jar. The old way was to unpack the whole of it, now 
only classes/ and lib/ are unpacked. This way nutch is missing the plugins/ 
directory. A workaround is to force unpacking of the plugin/ directory. This 
can be done by adding the following properties to nutch-site.xml
+ {{{
+ <property>
+ <name>mapreduce.job.jar.unpack.pattern</name>
+ <value>(?:classes/|lib/|plugins/).*</value>
+ </property>
+ 
+ <property>
+ <name>plugin.folders</name>
+ <value>${job.local.dir}/../jars/plugins</value>
+ </property>
+ }}}
+ It is then necessary to recreate the Nutch job file using ant.
+ 

Reply via email to