Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch solrindex" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch%20solrindex

Comment:
Update to reflect Nutch 1.3 API

New page:
Solrindex as an alias for org.apache.nutch.indexer.solr.SolrIndexer

This class replaces the legacy dependency for Nutch <1.3 to index to Apache 
Lucene for subsequent search. We now pass a SolrURL (amongst other arguements) 
to post data crawled by Nutch for search within an Apache Solr core.

Note: This class currently does commits once for all the reducers in one go. 
This is subject to change in subseqent versions of Nutch as a commit can take a 
lot of resources (cache warming) and it's not always necessary to commit after 
solrindex, solrdedup or solrclean, especially if they are run immediately after 
the other.

Usage:
{{{
bin/nutch solrindex <solr url> <crawldb> <linkdb> (<segment> ... | -dir 
<segments>)
}}}

'''<solr url>''': This is the HTTP solr instance you wish to index data with. 
e.g. ''http://localhost:8983/solr''

'''<crawldb>''': This arguement should be the path to the crawldb directory.

'''<linkdb>''': The path to the linkdb directory.

'''<segment> ...''': Should be the path to a directory containing segment(s).

'''-dir <segments>''': A comprehensive list of paths to several segment 
directories.


CommandLineOptions

Reply via email to