Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/89#discussion_r52833922
--- Diff: src/plugin/urlfilter-ignoreexempt/README.md ---
@@ -0,0 +1,52 @@
+urlfilter-ignoreexempt
+======================
+ This plugin allows certain urls to be exempted when the external links
are configured to be ignored.
+ This is useful when focused crawl is setup but some resources like
static files are linked from CDNs (external domains).
+
+How to enable ?
+==============
+Add `urlfilter-ignoreexempt` value to `plugin.includes` property
+```xml
+<property>
+ <name>plugin.includes</name>
+ <value>protocol-http|urlfilter-(regex|ignoreexempt)...</value>
+</property>
+```
+
+How to configure rules?
+================
+
+open `conf/db-ignore-external-exemptions.txt` and add rules
+
+#### Format :
+
+```
+UrlRegex1
+UrlRegex2
+UrlRegex3
+```
+
+
+#### NOTE ::
+ 1. If an url matches any of the given regexps then that url is exempted.
+ 2. \# in the beginning makes it a comment line
+ 3. To Test the regex, update this file and use the below command
+ bin/nutch plugin urlfilter-ignoreexempt
org.apache.nutch.urlfilter.ignoreexempt.ExemptionUrlFilter <URL>
+
+
+#### Example :
+
+ To exempt urls ending with image extensions, use this rule
+
+`.*\.(jpg|JPG|png$|PNG|gif|GIF)$# Testing`
--- End diff --
dito
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---