[jira] [Commented] (NUTCH-2334) Extension point for schedulers

Sebastian Nagel (JIRA) Wed, 29 Mar 2017 03:00:00 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946865#comment-15946865
 ]


Sebastian Nagel commented on NUTCH-2334:
----------------------------------------

Hi [~roannel],

see 
[scoring-adaptive|https://github.com/commoncrawl/nutch/blob/cc/src/plugin/scoring-adaptive/src/java/org/apache/nutch/scoring/adaptive/AdaptiveScoringFilter.java]
 which tries to do fetch scheduling in a ScoringFilter in combination with 
-topN, generator.min.score, and (per-host/per-queue) generater.max.count. The 
main difference is that configuration changes immediately impact the fetch list 
generation while a FetchSchedule sets (re)fetch time and intervals beforehand 
during CrawlDb update.

> having schedulers as plugins is an easier way to use and develop them and 
> maybe you can use several at the same time
That's true. You could stack FetchSchedule implementations via inheritance and 
then call {{super.shouldFetch(...)}}. But that's not really transparent and 
configurable.

What is you suggestion for an schedule plugin interface?

> Extension point for schedulers
> ------------------------------
>
>                 Key: NUTCH-2334
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2334
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator
>    Affects Versions: 1.12
>            Reporter: Roannel Fernández Hernández
>            Priority: Minor
>             Fix For: 1.14
>
>
> With an extension point for schedulers, the users should be able to create 
> new schedulers that meet to their own needs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

Reply via email to