Semyon Semyonov created NUTCH-2522:
--------------------------------------
Summary: Bidirectional URL exemption filter
Key: NUTCH-2522
URL: https://issues.apache.org/jira/browse/NUTCH-2522
Project: Nutch
Issue Type: Improvement
Components: plugin
Reporter: Semyon Semyonov
The current Nutch Url Exemption plugin exempts based on toUrl only, the new
plugin uses both fromUrl and toUrl and after the regex transformation, exempts
based on condition regex(fromUrl) == regex(toUrl).
This approach allows us to perform more complex url exemption filter checks,
such as allow links:
http://[www.website.com/|http://www.website.com/]home ->
http://[website.com/a|http://www.website.com/]bout ( with/without www).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)