[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187927#comment-13187927
]
Andrzej Bialecki commented on NUTCH-1201:
------------------------------------------
I agree that there are situations where you might want a custom fetcher (e.g.
depth-first crawling), and it would be good to come up with some more specific
API than just MapRunner.
I'm not convinced yet that providing interfaces (or rather abstract classes)
for the existing plumbing in Fetcher is a good idea - let's figure out first
whether this code is reusable at all for some other fetching strategies,
because if it's not then providing custom queue impls. may offer little value,
and perhaps customization should be implemented on a different level.
Re. thread spinning - I haven't seen yet an unequivocal case that would prove
that crawl contention is caused by the thread mgmt in Fetcher. Usually on
closer look the bottleneck turned out to lie elsewhere (network io, remote
throttling, dns lookups, politeness rules, etc).
> Allow for different FetcherThread impls
> ---------------------------------------
>
> Key: NUTCH-1201
> URL: https://issues.apache.org/jira/browse/NUTCH-1201
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.5
>
>
> For certain cases we need to modify parts in FetcherThread and make it
> pluggable. This introduces a new config directive fetcher.impl that takes a
> FQCN and uses that setting Fetcher.fetch to load a class to use for
> job.setMapRunnerClass(). This new class has to extend Fetcher and and inner
> class FetcherThread. This allows for overriding methods in FetcherThread but
> also methods in Fetcher itself if required.
> A follow up on this issue would be to refactor parts of FetcherThread to make
> it easier to override small sections instead of copying the entire method
> body for a small change, which is now the case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira