Add a sample config
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/9a284c0d Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/9a284c0d Diff: http://git-wip-us.apache.org/repos/asf/nutch/diff/9a284c0d Branch: refs/heads/master Commit: 9a284c0d6d2aec86b00016a8abeddc07e5292ee9 Parents: 2015703 Author: Thamme Gowda <[email protected]> Authored: Sun Feb 28 19:29:09 2016 -0800 Committer: Thamme Gowda <[email protected]> Committed: Sun Feb 28 19:29:09 2016 -0800 ---------------------------------------------------------------------- conf/db-ignore-external-exemptions.txt | 33 +++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/nutch/blob/9a284c0d/conf/db-ignore-external-exemptions.txt ---------------------------------------------------------------------- diff --git a/conf/db-ignore-external-exemptions.txt b/conf/db-ignore-external-exemptions.txt new file mode 100644 index 0000000..46bfdb0 --- /dev/null +++ b/conf/db-ignore-external-exemptions.txt @@ -0,0 +1,33 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# +# Exemption rules to db.ignore.external.links + + +# Format : +#-------- +# The format is same same as `regex-urlfilter.txt`. +# Each non-comment, non-blank line contains a regular expression +# prefixed by '+' or '-'. The first matching pattern in the file +# determines whether a URL is exempted or ignored. If no pattern +# matches, the URL is ignored. + + + +# Example 1: +#---------- +# To exempt urls ending with image extensions, uncomment the below line +# +(?i)\.(jpg|png|gif)$
