Hi, Thank you for clarifications. Regarding the metadata, what would be a proper way of parsing end indexing multivalued tags in nutch-2.0 then?
Thanks. Alex. -----Original Message----- From: Ferdy Galema <ferdy.gal...@kalooga.com> To: user <user@nutch.apache.org> Sent: Wed, Jun 27, 2012 1:20 am Subject: Re: parse and solrindex in nutch-2.0 Hi, Correct. When using <specific_batchid> or -all you have to run the updaterjob first. (Because it checks the dbupdate mark to not be null). But a workaround is to simply run the indexer with -reindex. This will ignore the db update mark and tries to index every parsed row (at any time). About the metadata: It's a known limitation that there cannot be any duplicate keys. (I'm not aware of any progress regarding this). fetcher.store.content indeed does not seem to work. This is a bug. I created an issue for this: NUTCH-1411 Ferdy. On Tue, Jun 26, 2012 at 11:47 AM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > update (or whatever the actual name of the command is) after parsing? > > On 25 June 2012 22:35, <alx...@aim.com> wrote: > > > Hello, > > > > I have tested nutch-2.0 with hbase and mysql trying to index only one url > > with depth 1. > > > > I tried to fetch an html tag value and parse it to metadata column in > > webpage object by adding parse-tag plugin. I saw there is no metadata > > member variable in Parse class, so I used putToMetadata function from > > Webpage class and it turned out that this function overwrites values for > > the same key, i.e, it keeps only the last tag value if there are multiple > > tags. > > > > Next > > > > bin/nutch solrindex http://127.0.0.1:8983/solr/ -all > > SolrIndexerJob: starting > > SolrIndexerJob: done. > > > > I did > > 1.bin/nutch inject > > 2.bin/nutch generate > > 3.bin/nutch fetch batchId > > 4.bin/nutch parse batchId > > 5.bin/nutch bin/nutch solrindex http://127.0.0.1:8983/solr/ -all > > > > There is no data added to solr index with the url I tried to index. > > > > Besides these, nutch-2.0 keeps content in the content column of webpage > > table if I put in the config > > > > <property> > > <name>fetcher.store.content</name> > > <value>false</value> > > <description>If true, fetcher will store content.</description> > > </property> > > > > > > Any ideas, what is done wrong or how to fix these issues are welcome. > > > > Thanks. > > Alex. > > > > > > > > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >