Hi,

Thank you for clarifications. 
Regarding the metadata, what would be a proper way of parsing end indexing 
multivalued tags in nutch-2.0 then?

Thanks.
Alex.



-----Original Message-----
From: Ferdy Galema <ferdy.gal...@kalooga.com>
To: user <user@nutch.apache.org>
Sent: Wed, Jun 27, 2012 1:20 am
Subject: Re: parse and solrindex in nutch-2.0


Hi,

Correct. When using <specific_batchid> or -all you have to run the
updaterjob first. (Because it checks the dbupdate mark to not be null). But
a workaround is to simply run the indexer with -reindex. This will ignore
the db update mark and tries to index every parsed row (at any time).

About the metadata: It's a known limitation that there cannot be any
duplicate keys. (I'm not aware of any progress regarding this).

fetcher.store.content indeed does not seem to work. This is a bug. I
created an issue for this: NUTCH-1411

Ferdy.

On Tue, Jun 26, 2012 at 11:47 AM, Julien Nioche <
lists.digitalpeb...@gmail.com> wrote:

> update (or whatever the actual name of the command is) after parsing?
>
> On 25 June 2012 22:35, <alx...@aim.com> wrote:
>
> > Hello,
> >
> > I have tested nutch-2.0 with hbase and mysql trying to index only one url
> > with depth 1.
> >
> >  I tried to fetch an html tag value and parse it to metadata column in
> > webpage object by adding parse-tag plugin. I saw there is no metadata
> > member variable in Parse class, so I used putToMetadata function from
> > Webpage class and it turned  out that this function overwrites values for
> > the same key, i.e, it keeps only the last tag value if there are multiple
> > tags.
> >
> > Next
> >
> > bin/nutch solrindex http://127.0.0.1:8983/solr/ -all
> > SolrIndexerJob: starting
> > SolrIndexerJob: done.
> >
> > I did
> > 1.bin/nutch inject
> > 2.bin/nutch generate
> > 3.bin/nutch fetch batchId
> > 4.bin/nutch parse batchId
> > 5.bin/nutch bin/nutch solrindex http://127.0.0.1:8983/solr/ -all
> >
> > There is no data added to solr index with the url I tried to index.
> >
> > Besides these, nutch-2.0 keeps content in the content column of webpage
> > table if I put in the config
> >
> >  <property>
> >    <name>fetcher.store.content</name>
> >      <value>false</value>
> >      <description>If true, fetcher will store content.</description>
> >  </property>
> >
> >
> > Any ideas, what is done wrong or how to fix these issues are welcome.
> >
> > Thanks.
> > Alex.
> >
> >
> >
> >
> >
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

 

Reply via email to