Giuseppe Totaro created NUTCH-1995:
--------------------------------------
Summary: Add support for wildcard to http.robot.rules.whitelist
Key: NUTCH-1995
URL: https://issues.apache.org/jira/browse/NUTCH-1995
Project: Nutch
Issue Type: Improvement
Components: robots
Affects Versions: 1.10
Reporter: Giuseppe Totaro
The {{http.robot.rules.whitelist}} configuration parameter allows to specify a
comma separated list of hostnames or IP addresses to ignore robot rules parsing
for.
Adding support for wildcard in {{http.robot.rules.whitelist}} could be very
useful and simplify the configuration, for example, if we need to give many
hostnames/addresses. Here is an example:
{noformat}
<name>http.robot.rules.whitelist</name>
<value>*.sample.com</value>
<description>Comma separated list of hostnames or IP addresses to ignore
robot rules parsing for. Use with care and only if you are explicitly
allowed by the site owner to ignore the site's robots.txt!
</description>
</property>
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)