[ 
https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675312#action_12675312
 ] 

Andrzej Bialecki  commented on NUTCH-477:
-----------------------------------------

(auto-review ;) )

After reflecting on this patch for a while, I'm no longer so sure the benefits 
are worth the cost. We have a similar feature in URLNormalizers, and I find 
that I use it very very rarely. On the cons side, it complicates the code and 
configuration a lot.

Perhaps we should limit this patch to contain just the part that allows for the 
early termination - I actually use this part more often, and it's helpful to 
short-circuit the filtering chain to avoid running costly filters or enforce 
certain exceptions early in the pipeline. Comments?

> Extend URLFilters to support different filtering chains
> -------------------------------------------------------
>
>                 Key: NUTCH-477
>                 URL: https://issues.apache.org/jira/browse/NUTCH-477
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: urlfilters.patch
>
>
> I propose to make the following changes to URLFilters:
> * extend URLFilters so that they support different filtering rules depending 
> on the context where they are executed. This functionality mirrors the one 
> that URLNormalizers already support.
> * change their return value to an int code, in order to support early 
> termination of long filtering chains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to