[
https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789002#comment-16789002
]
Roannel Fernández Hernández edited comment on NUTCH-2334 at 3/27/19 8:04 PM:
-----------------------------------------------------------------------------
Returning to this issue after a long time (sorry for that), the strategy is as
follow:
First of all, to ensure that this is not a difficult and abrupt change for
users, we should mark the current properties as deprecated and both the current
features as the new ones should live together for a while.
Secondly, a new property is born: voter.strategy (this couldn't be the final
name) which indicates the strategy to follow when deciding whether a URL should
be fetched or not. For this case the schedulers will act as voters following
two mainly strategies: AND and OR. With the first strategy all voters
(schedulers) have to say yes (TRUE) to finally indicate the URL has to be
fetched. With the second strategy it's enough that one voter (scheduler) say
yes. The schedulers can abstain to vote (returning null). In this case the vote
is not used for final vote.
In other cases where schedulers enter the game, only the scheduler that is
loaded first will be responsible for doing the task. One property is used to
indicate the order this plugins must be loaded. The code is
[here|https://gist.github.com/r0ann3l/18367e0a762f7c30e326fac2b098979c]
I understand these changes could be a little complicated to understand for
final users. That's why I have a second proposition: to make the shedulers
pluggable and nothing else. The only thing I see in this case, is that several
plugins for the same extension point could be loaded. So, we could use the
first loaded plugin (maybe show a warning too) or throw a {{RuntimeException}}.
I need some feedback from you guys for continuing this work. Thanks a lot.
was (Author: roannel):
Returning to this issue after a long time (sorry for that), the strategy is as
follow:
First of all, to ensure that this is not a difficult and abrupt change for
users, we should mark the current properties as deprecated and both the current
features as the new ones should live together for a while.
Secondly, a new property is born: voter.strategy (this couldn't be the final
name) which indicates the strategy to follow when deciding whether a URL should
be fetched or not. For this case the schedulers will act as voters following
two mainly strategies: AND and OR. With the first strategy all voters
(schedulers) have to say yes (TRUE) to finally indicate the URL has to be
fetched. With the second strategy it's enough that one voter (scheduler) say
yes. The schedulers can abstain to vote (returning null). In this case the vote
is not used for final vote.
In other cases where schedulers enter the game, only the scheduler that is
loaded first will be responsible for doing the task. One property is used to
indicate the order this plugins must be loaded. The code is
[here|https://github.com/r0ann3l/nutch/blob/NUTCH-2334/src/java/org/apache/nutch/crawl/FetchSchedulers.java]
I understand these changes could be a little complicated to understand for
final users. That's why I have a second proposition: to make the shedulers
pluggable and nothing else. The only thing I see in this case, is that several
plugins for the same extension point could be loaded. So, we could use the
first loaded plugin (maybe show a warning too) or throw a {{RuntimeException}}.
I need some feedback from you guys for continuing this work. Thanks a lot.
> Extension point for schedulers
> ------------------------------
>
> Key: NUTCH-2334
> URL: https://issues.apache.org/jira/browse/NUTCH-2334
> Project: Nutch
> Issue Type: New Feature
> Components: generator
> Affects Versions: 1.12
> Reporter: Roannel Fernández Hernández
> Assignee: Roannel Fernández Hernández
> Priority: Minor
> Fix For: 1.16
>
>
> With an extension point for schedulers, the users should be able to create
> new schedulers that meet to their own needs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)