Add a sample config

Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/9a284c0d
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/9a284c0d
Diff: http://git-wip-us.apache.org/repos/asf/nutch/diff/9a284c0d

Branch: refs/heads/master
Commit: 9a284c0d6d2aec86b00016a8abeddc07e5292ee9
Parents: 2015703
Author: Thamme Gowda <[email protected]>
Authored: Sun Feb 28 19:29:09 2016 -0800
Committer: Thamme Gowda <[email protected]>
Committed: Sun Feb 28 19:29:09 2016 -0800

----------------------------------------------------------------------
 conf/db-ignore-external-exemptions.txt | 33 +++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/nutch/blob/9a284c0d/conf/db-ignore-external-exemptions.txt
----------------------------------------------------------------------
diff --git a/conf/db-ignore-external-exemptions.txt 
b/conf/db-ignore-external-exemptions.txt
new file mode 100644
index 0000000..46bfdb0
--- /dev/null
+++ b/conf/db-ignore-external-exemptions.txt
@@ -0,0 +1,33 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#
+# Exemption rules to db.ignore.external.links
+
+
+# Format :
+#--------
+# The format is same same as `regex-urlfilter.txt`.
+# Each non-comment, non-blank line contains a regular expression
+# prefixed by '+' or '-'.  The first matching pattern in the file
+# determines whether a URL is exempted or ignored.  If no pattern
+# matches, the URL is ignored.
+
+
+
+# Example 1:
+#----------
+# To exempt urls ending with image extensions, uncomment the below line
+# +(?i)\.(jpg|png|gif)$

Reply via email to