Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Nutch2Tutorial" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/Nutch2Tutorial?action=diff&rev1=9&rev2=10

  
  This document describes how to get Nutch 2.0 to use HBase as a storage 
backend for Gora.
  
-  * Grab a distribution of Nutch 2.X from 
[[http://www.apache.org/dyn/closer.cgi/nutch/|here]]
+  * Grab the latest distribution of Nutch 2.X from 
[[http://www.apache.org/dyn/closer.cgi/nutch/|here]]
-  * Install and configure HBase. You can get it 
[[http://archive.apache.org/dist/hbase/|here]] ('''N.B.''' Gora 0.2 uses HBase 
0.90.4, however the setup is known to work with more recent versions of the 
HBase 0.90.x branch)
+  * Install and configure HBase. You can get it 
[[http://archive.apache.org/dist/hbase/|here]] ('''N.B.''' Gora 0.3 uses HBase 
0.90.4, however the setup is known to work with more recent versions of the 
HBase 0.90.x branch)
   * Specify the GORA backend in nutch-site.xml
  
  {{{
@@ -22, +22 @@

  {{{
      <!-- Uncomment this to use HBase as Gora backend. -->
      
-     <dependency org="org.apache.gora" name="gora-hbase" rev="0.2" 
conf="*->default" />
+     <dependency org="org.apache.gora" name="gora-hbase" rev="0.3" 
conf="*->default" />
  }}}
  
   * Ensure that HBaseStore is set as the default datastore in gora.properties
@@ -42, +42 @@

    nutch readdb
  }}}
  
+ '''N.B.''' The crawl command in the bin/nutch script is deprecated. You 
should use individual commands or alternatively use the bin/crawl script... 
which effectively chains together individual commands.
+ 
  You should find more details in the logs on 
''$NUTCH_HOME/runtime/local/logs/hadoop.log''.
  
  '''N.B.''' It's possible to encounter the following exception: 
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration; 
this is caused by the fact that sometimes the hbase TEST jar is deployed in the 
lib dir. To resolve this just copy the lib over from your installed HBase dir 
into the build lib dir. (This issue is currently in progress).
  
- '''N.B.''' The process of using the other datastore implementations offered 
within Gora e.g. Apache Cassandra, Accumulo and Sql, can be achieved simply by 
tweaking the above settings prior to compiling the Nutch code.
+ '''N.B.''' The process of using the other datastore implementations offered 
within Gora e.g. Apache Cassandra, Accumulo, can be achieved simply by tweaking 
the above settings prior to compiling the Nutch code.
+ 
+ '''N.B.''' As of Apache Gora release 0.3, the gora-sql 0.1.1-incubating 
artifact is deprecated. The choice is to downgrade to Nutch 2.1 if you wish to 
use MySQL or HSQLDB as a Gora backend.
  
  For more details of the command line interface options, please see 
[[http://wiki.apache.org/nutch/CommandLineOptions|here]], or of course run 
./bin/nutch which will print usage to std out.
  Finally, for a more detailed Nutch (1.X) tutorial, please see 
[[http://wiki.apache.org/nutch/NutchTutorial|here]]

Reply via email to