[ 
https://issues.apache.org/jira/browse/NUTCH-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387536#comment-16387536
 ] 

ASF GitHub Bot commented on NUTCH-2522:
---------------------------------------

sebastian-nagel commented on a change in pull request #290: NUTCH-2522
URL: https://github.com/apache/nutch/pull/290#discussion_r172456235
 
 

 ##########
 File path: conf/db-ignore-external-exemptions-bidirectional.txt
 ##########
 @@ -0,0 +1,33 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
 
 Review comment:
   Configuration files should be added as *.template. And are "instantiated" 
(copied) during the first compilation. Users than can modify the content 
without conflicts and undesired overwrites.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  Bidirectional URL exemption filter
> -----------------------------------
>
>                 Key: NUTCH-2522
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2522
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>            Reporter: Semyon Semyonov
>            Priority: Minor
>
> The current Nutch Url Exemption plugin exempts based on toUrl only, the new 
> plugin uses both fromUrl and toUrl and after the regex transformation, 
> exempts based on condition regex(fromUrl) == regex(toUrl).
> This approach allows us to perform more complex url exemption filter checks, 
> such as allow links:
> http://[www.website.com/|http://www.website.com/]home -> 
> http://[website.com/a|http://www.website.com/]bout ( with/without www).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to