Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "GORA_HBase" page has been changed by JulienNioche. http://wiki.apache.org/nutch/GORA_HBase -------------------------------------------------- New page: This document describes how to get Nutch 2.0 to use HBase as a backend for GORA and is based on the revision 993857 of the Nutch trunk * Install and configure HBase 0.20.6 * Pull the GORA code and compile it * Copy the jars from gora/gora-hbase/lib-ext to nutch/lib * Add the following to nutch/ivy/ivy.xml {{{ <dependency org="org.gora" name="gora-hbase" rev="0.1" conf="*->compile"> <exclude org="com.sun.jdmk"/> <exclude org="com.sun.jmx"/> <exclude org="javax.jms"/> </dependency> }}} * Specify the GORA backend in nutch-site.xml {{{ <property> <name>storage.data.store.class</name> <value>org.gora.hbase.store.HBaseStore</value> <description>Default class for storing data</description> </property> }}} * Add mapping file for hbase in conf/gora-hbase-mapping.xml {{{ <?xml version="1.0" encoding="UTF-8"?> <gora-orm> <table name="webtable"> <family name="p"/> <!-- This can also have params like compression, bloom filters --> <family name="f"/> <family name="s"/> <family name="il"/> <family name="ol"/> <family name="h"/> <family name="mtdt"/> <family name="mk"/> </table> <class table="webtable" keyClass="java.lang.String" name="org.apache.nutch.storage.WebPage"> <!-- fetch fields --> <field name="baseUrl" family="f" qualifier="bas"/> <field name="status" family="f" qualifier="st"/> <field name="prevFetchTime" family="f" qualifier="pts"/> <field name="fetchTime" family="f" qualifier="ts"/> <field name="fetchInterval" family="f" qualifier="fi"/> <field name="retriesSinceFetch" family="f" qualifier="rsf"/> <field name="reprUrl" family="f" qualifier="rpr"/> <field name="content" family="f" qualifier="cnt"/> <field name="contentType" family="f" qualifier="typ"/> <field name="protocolStatus" family="f" qualifier="prot"/> <field name="modifiedTime" family="f" qualifier="mod"/> <!-- parse fields --> <field name="title" family="p" qualifier="t"/> <field name="text" family="p" qualifier="c"/> <field name="parseStatus" family="p" qualifier="st"/> <field name="signature" family="p" qualifier="sig"/> <field name="prevSignature" family="p" qualifier="psig"/> <!-- score fields --> <field name="score" family="s" qualifier="s"/> <field name="headers" family="h"/> <field name="inlinks" family="il"/> <field name="outlinks" family="ol"/> <field name="metadata" family="mtdt"/> <field name="markers" family="mk"/> </class> </gora-orm> }}} * Compile Nutch -> ant runtime * Make sure HBase is started and working properly You should then be able to use it. Try going to'' $NUTCH_HOME/runtime/local/bin'' and do : {{{ nutch inject /someseedDir nutch readdb }}} You should find more details in the logs on ''$NUTCH_HOME/runtime/local/logs/hadoop.log''