Hi,

I think we need to commit all the necessary files to nutch so that it can
work out of the box for sql, hbase and casssandra. We can even write
commented-out entries in gora.properties, nutch-site.xml, etc so that using
nutch with different backends becomes a configuration change. I will open a
issue to track this down.

Cheers,
Enis

On Wed, Sep 8, 2010 at 1:53 PM, Julien Nioche <lists.digitalpeb...@gmail.com
> wrote:

> Hi guys,
>
> I've summarized the steps to follow for having GORA+Hbase with Nutch 2.0 on
> http://wiki.apache.org/nutch/GORA_HBase
>
> Feel free to amend and improve as you see fit.
>
> Please bear in mind that Nutch 2.0 is at a very early stage and is far from
> being bug-proof, see in particular [1].
>
> HTH
>
> Julien
>
> [1] https://issues.apache.org/jira/browse/NUTCH-893
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>
>
> On 6 September 2010 13:35, Andrzej Bialecki <a...@getopt.org> wrote:
>
> > On 2010-09-05 14:56, David Stuart wrote:
> >
> >> Hi All,
> >>
> >> I have done as per below and can create a table from within the hbase
> >> shell. I found the appropriate create table method
> >> bin/nutch org.apache.nutch.storage.WebTableCreator webtable but it only
> >> returns null
> >>
> >> Any help would be great
> >>
> >
> > You don't have to create a table manually - this should happen
> > automatically when you first run any Nutch tool. Just make sure you have
> > hbase-site.xml on your classpath in Nutch - best if you put it in your
> conf/
> > and rebuild, so that it's packed into a job jar.
> >
> > Here's for example my config files that work with HBase (I don't use any
> > non-standard settings for HBase, so my hbase-site.xml has no properties,
> but
> > still it needs to be included in Nutch job jar):
> >
> > gora-hbase-mapping.xml:
> > -------------------------------------------------------------------------
> >
> > <gora-orm>
> >
> > <table name="webtable">
> >  <family name="p"/> <!-- This can also have params like compression,
> bloom
> > filters -->
> >  <family name="f"/>
> >  <family name="s"/>
> >  <family name="il"/>
> >  <family name="ol"/>
> >  <family name="h"/>
> >  <family name="mtdt"/>
> >  <family name="mk"/>
> > </table>
> >
> > <class table="webtable" keyClass="java.lang.String"
> > name="org.apache.nutch.storage.WebPage">
> >  <!-- fetch fields                                       -->
> >  <field name="baseUrl" family="f" qualifier="bas"/>
> >  <field name="status" family="f" qualifier="st"/>
> >  <field name="prevFetchTime" family="f" qualifier="pts"/>
> >  <field name="fetchTime" family="f" qualifier="ts"/>
> >  <field name="fetchInterval" family="f" qualifier="fi"/>
> >  <field name="retriesSinceFetch" family="f" qualifier="rsf"/>
> >  <field name="reprUrl" family="f" qualifier="rpr"/>
> >  <field name="content" family="f" qualifier="cnt"/>
> >  <field name="contentType" family="f" qualifier="typ"/>
> >  <field name="protocolStatus" family="f" qualifier="prot"/>
> >  <field name="modifiedTime" family="f" qualifier="mod"/>
> >
> >  <!-- parse fields                                       -->
> >  <field name="title" family="p" qualifier="t"/>
> >  <field name="text" family="p" qualifier="c"/>
> >  <field name="parseStatus" family="p" qualifier="st"/>
> >  <field name="signature" family="p" qualifier="sig"/>
> >  <field name="prevSignature" family="p" qualifier="psig"/>
> >
> >  <!-- score fields                                       -->
> >  <field name="score" family="s" qualifier="s"/>
> >
> >  <field name="headers" family="h"/>
> >
> >  <field name="inlinks" family="il"/>
> >
> >  <field name="outlinks" family="ol"/>
> >
> >  <field name="metadata" family="mtdt"/>
> >
> >  <field name="markers" family="mk"/>
> >
> > </class>
> >
> > </gora-orm>
> > -------------------------------------------------------------------------
> >
> > nutch-site.xml:
> > -------------------------------------------------------------------------
> > ... blah blah, a lot of unrelated stuff...
> >
> > <property>
> >  <name>storage.data.store.class</name>
> >  <value>org.gora.hbase.store.HBaseStore</value>
> >
> >  <description>Default class for storing data</description>
> > </property>
> > -------------------------------------------------------------------------
> >
> > Of course you need also to use the same hadoop files (hdfs-site and
> > mapred-site) as the ones that HBase uses.
> >
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
>

Reply via email to