[ 
https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404198#comment-15404198
 ] 

Sujen Shah commented on NUTCH-2132:
-----------------------------------

Hi Everyone, 
I have created an initial PR for this https://github.com/apache/nutch/pull/138. 
Things I have changed: 
1. Removed the hard dependency on RabbitMQ 
2. Created an interface for a new plugin extension 
3. Moved the RabbitMQ code as a new plugin 

I need help in getting the plugin working. I registered the new plugin 
extension and tried to develop the publisher plugin in the same way the scoring 
filters work. This may help more than one publisher queue implementations to 
work together. Then I registered the plugin developed for rabbitmq. The code 
builds and the fetcher runs fine, but without the publisher working correctly. 

The issue I face is in getting the RabbitMQ plugin loaded in the 
NutchPublishers class. The setConfig() method is not able to load any plugins. 
Link to code - 
https://github.com/sujen1412/nutch/blob/2c484ec4789c84f7bf9e592e15c96cf788ef5967/src/java/org/apache/nutch/publisher/NutchPublishers.java#L38-L54

What am I missing here ? 

Thanks for the help :) 

> Publisher/Subscriber model for Nutch to emit events 
> ----------------------------------------------------
>
>                 Key: NUTCH-2132
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2132
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, REST_api
>            Reporter: Sujen Shah
>            Assignee: Chris A. Mattmann
>              Labels: memex
>             Fix For: 1.13
>
>         Attachments: NUTCH-2132.patch, NUTCH-2132.v2.patch, 
> PubSub_routingkey.patch
>
>
> It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- 
> Fetcher events like fetch-start, fetch-end, a fetch report which may contain 
> data like outlinks of the current fetched url, score, etc). 
> A consumer of this functionality could use this data to generate real time 
> visualization and generate statics of the crawl without having to wait for 
> the fetch round to finish. 
> The REST API could contain an endpoint which would respond with a url to 
> which a client could subscribe to get the fetcher events. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to