Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "GORA_HBase" page has been changed by FerdyGalema:
http://wiki.apache.org/nutch/GORA_HBase?action=diff&rev1=10&rev2=11

Comment:
Reflect changes in trunk: Added hbase-gora-mapping and added thrift to exclude

- This document describes how to get Nutch 2.0 to use HBase as a backend for 
GORA and is based on the revision 993857 of the Nutch trunk
+ This document describes how to get Nutch to use HBase as a backend for GORA 
and is based on the revision 993857 of the Nutch trunk
  
   * Install and configure HBase 0.20.6. You can check it out from 
[[http://svn.apache.org/repos/asf/hbase/tags/0.20.6/|here]] ('''N.B.''' It is 
important that you grab HBase version 0.20.6 at this is supported by Gora)
-  * Add the following to nutch/ivy/ivy.xml (global exclusion):
- 
- {{{
- <exclude module="thrift" />
- }}}
- 
   * Specify the GORA backend in nutch-site.xml
  
  {{{
@@ -20, +14 @@

  }}}
  Note: Currently HBaseStore is NOT YET THREAD-SAFE, so all processes should 
have single threaded settings (i.e. set number of fetchers to 1). Work to make 
it thread-safe is in progress.
  
-  * Create a mapping file for hbase in conf/gora-hbase-mapping.xml
- 
- {{{
- <?xml version="1.0" encoding="UTF-8"?>
- <gora-orm>
- <table name="webpage">
-   <family name="p"/> <!-- This can also have params like compression, bloom 
filters -->
-   <family name="f"/>
-   <family name="s"/>
-   <family name="il"/>
-   <family name="ol"/>
-   <family name="h"/>
-   <family name="mtdt"/>
-   <family name="mk"/>
- </table>
- <class table="webpage" keyClass="java.lang.String" 
name="org.apache.nutch.storage.WebPage">
-   <!-- fetch fields                                       -->
-   <field name="baseUrl" family="f" qualifier="bas"/>
-   <field name="status" family="f" qualifier="st"/>
-   <field name="prevFetchTime" family="f" qualifier="pts"/>
-   <field name="fetchTime" family="f" qualifier="ts"/>
-   <field name="fetchInterval" family="f" qualifier="fi"/>
-   <field name="retriesSinceFetch" family="f" qualifier="rsf"/>
-   <field name="reprUrl" family="f" qualifier="rpr"/>
-   <field name="content" family="f" qualifier="cnt"/>
-   <field name="contentType" family="f" qualifier="typ"/>
-   <field name="protocolStatus" family="f" qualifier="prot"/>
-   <field name="modifiedTime" family="f" qualifier="mod"/>
-   <!-- parse fields                                       -->
-   <field name="title" family="p" qualifier="t"/>
-   <field name="text" family="p" qualifier="c"/>
-   <field name="parseStatus" family="p" qualifier="st"/>
-   <field name="signature" family="p" qualifier="sig"/>
-   <field name="prevSignature" family="p" qualifier="psig"/>
-   <!-- score fields                                       -->
-   <field name="score" family="s" qualifier="s"/>
-   <field name="headers" family="h"/>
-   <field name="inlinks" family="il"/>
-   <field name="outlinks" family="ol"/>
-   <field name="metadata" family="mtdt"/>
-   <field name="markers" family="mk"/>
- </class>
- </gora-orm>
- }}}
   * Compile Nutch -> ant runtime
   * Make sure HBase is started and working properly as per the quick start 
tutorial [[http://hbase.apache.org/book/quickstart.html|here]]
  

Reply via email to