To add to Julien's comments there was a contribution made by Gabriele a
while ago which addressed this issue (however I have not used his scripts
extensively). They might be of interest for a look. Try the link below

http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script

On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche <
[email protected]> wrote:

> Hi Matthew,
>
> This is usually achieved by writing a script containing the individual
> Nutch commands (as opposed to calling 'nutch crawl') and index at the end of
> a generate-fetch-parse-update-linkdb sequence. You don't need any plugins
> for that
>
> HTH
>
> Julien
>
>
> On 12 July 2011 13:35, Matthew Painter <[email protected]> wrote:
>
>> Hi all,
>>
>> I was wondering about the feasibility of creating a plugin for nutch that
>> create a solr update command, and added it to a queue for indexing after it
>> first parses the page, rather than when crawling has finished.
>>
>> This would allow you to do "real-time" indexing when crawling.
>>
>> Drawbacks: Not able to use the graph to give relevancy information.
>>
>> Wondering what initial thoughts are about this?
>>
>> Thanks :)
>>
>>
>>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>



-- 
*Lewis*

Reply via email to