[ 
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190050#comment-13190050
 ] 

Ken Krugler commented on NUTCH-1201:
------------------------------------

My 2 cents, based on ancient history.

We extended Nutch in several ways during my Krugle startup, and in general the 
experience wound up being pretty painful. Even with the help of Andrzej and 
Stefan Groschupf (two very knowledgeable Nutch developers), we wound up 
spinning our wheels.

Part of the problem was the monolithic nature of Nutch, which made (makes?) it 
hard to extend in ways beyond plugin extension points that don't need to do 
much other than output different results for the same input data.

My thought here is that I'd look at having a very high level extension point - 
e.g. "I've got a fetch list (generated by other Nutch code) in the segment, and 
now I need to process that list, with the end result being data in new sub-dirs 
in the segment". But keep the fetcher around as a re-usable component (see 
crawler-commons for one version from Bixo).

Then if you want to do some crazy crawl-3-deep, you can craft your own solution 
(which might not even use map-reduce).

-- Ken

PS - my personal bias is to implement custom solutions using Cascading & 
reuseable Java classes, but I know that doesn't fit well with the more common 
user of Nutch, where "programming by XML" (configuration only) seems to be the 
sweet spot.


                
> Allow for different FetcherThread impls
> ---------------------------------------
>
>                 Key: NUTCH-1201
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1201
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>
> For certain cases we need to modify parts in FetcherThread and make it 
> pluggable. This introduces a new config directive fetcher.impl that takes a 
> FQCN and uses that setting Fetcher.fetch to load a class to use for 
> job.setMapRunnerClass(). This new class has to extend Fetcher and and inner 
> class FetcherThread. This allows for overriding methods in FetcherThread but 
> also methods in Fetcher itself if required.
> A follow up on this issue would be to refactor parts of FetcherThread to make 
> it easier to override small sections instead of copying the entire method 
> body for a small change, which is now the case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to