To add to Julien's comments there was a contribution made by Gabriele a while ago which addressed this issue (however I have not used his scripts extensively). They might be of interest for a look. Try the link below
http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche < [email protected]> wrote: > Hi Matthew, > > This is usually achieved by writing a script containing the individual > Nutch commands (as opposed to calling 'nutch crawl') and index at the end of > a generate-fetch-parse-update-linkdb sequence. You don't need any plugins > for that > > HTH > > Julien > > > On 12 July 2011 13:35, Matthew Painter <[email protected]> wrote: > >> Hi all, >> >> I was wondering about the feasibility of creating a plugin for nutch that >> create a solr update command, and added it to a queue for indexing after it >> first parses the page, rather than when crawling has finished. >> >> This would allow you to do "real-time" indexing when crawling. >> >> Drawbacks: Not able to use the graph to give relevancy information. >> >> Wondering what initial thoughts are about this? >> >> Thanks :) >> >> >> > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > -- *Lewis*

