[ 
https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789002#comment-16789002
 ] 

Roannel Fernández Hernández edited comment on NUTCH-2334 at 3/27/19 8:04 PM:
-----------------------------------------------------------------------------

Returning to this issue after a long time (sorry for that), the strategy is as 
follow:

First of all, to ensure that this is not a difficult and abrupt change for 
users, we should mark the current properties as deprecated and both the current 
features as the new ones should live together for a while.

Secondly, a new property is born: voter.strategy (this couldn't be the final 
name) which indicates the strategy to follow when deciding whether a URL should 
be fetched or not. For this case the schedulers will act as voters following 
two mainly strategies: AND and OR. With the first strategy all voters 
(schedulers) have to say yes (TRUE) to finally indicate the URL has to be 
fetched. With the second strategy it's enough that one voter (scheduler) say 
yes. The schedulers can abstain to vote (returning null). In this case the vote 
is not used for final vote.

In other cases where schedulers enter the game, only the scheduler that is 
loaded first will be responsible for doing the task. One property is used to 
indicate the order this plugins must be loaded. The code is 
[here|https://gist.github.com/r0ann3l/18367e0a762f7c30e326fac2b098979c]

I understand these changes could be a little complicated to understand for 
final users. That's why I have a second proposition: to make the shedulers 
pluggable and nothing else. The only thing I see in this case, is that several 
plugins for the same extension point could be loaded. So, we could use the 
first loaded plugin (maybe show a warning too) or throw a {{RuntimeException}}.

I need some feedback from you guys for continuing this work. Thanks a lot.


was (Author: roannel):
Returning to this issue after a long time (sorry for that), the strategy is as 
follow:

First of all, to ensure that this is not a difficult and abrupt change for 
users, we should mark the current properties as deprecated and both the current 
features as the new ones should live together for a while.

Secondly, a new property is born: voter.strategy (this couldn't be the final 
name) which indicates the strategy to follow when deciding whether a URL should 
be fetched or not. For this case the schedulers will act as voters following 
two mainly strategies: AND and OR. With the first strategy all voters 
(schedulers) have to say yes (TRUE) to finally indicate the URL has to be 
fetched. With the second strategy it's enough that one voter (scheduler) say 
yes. The schedulers can abstain to vote (returning null). In this case the vote 
is not used for final vote.

In other cases where schedulers enter the game, only the scheduler that is 
loaded first will be responsible for doing the task. One property is used to 
indicate the order this plugins must be loaded. The code is 
[here|https://github.com/r0ann3l/nutch/blob/NUTCH-2334/src/java/org/apache/nutch/crawl/FetchSchedulers.java]

I understand these changes could be a little complicated to understand for 
final users. That's why I have a second proposition: to make the shedulers 
pluggable and nothing else. The only thing I see in this case, is that several 
plugins for the same extension point could be loaded. So, we could use the 
first loaded plugin (maybe show a warning too) or throw a {{RuntimeException}}.

I need some feedback from you guys for continuing this work. Thanks a lot.

> Extension point for schedulers
> ------------------------------
>
>                 Key: NUTCH-2334
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2334
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator
>    Affects Versions: 1.12
>            Reporter: Roannel Fernández Hernández
>            Assignee: Roannel Fernández Hernández
>            Priority: Minor
>             Fix For: 1.16
>
>
> With an extension point for schedulers, the users should be able to create 
> new schedulers that meet to their own needs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to