+1 please commit! Thanks seb Sent from my iPhone
> On Apr 17, 2015, at 4:15 PM, Sebastian Nagel (JIRA) <[email protected]> wrote: > > > [ > https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Sebastian Nagel updated NUTCH-1927: > ----------------------------------- > Attachment: test_NUTCH-1927.2015-04-17.txt > NUTCH-1927.2015-04-17.patch > > Patch to log more verbosely, here for a test on "localhost": > {noformat} > 2015-04-17 21:58:03,902 INFO protocol.RobotRulesParser - Whitelisted hosts: > [localhost] > ... > 2015-04-17 21:58:03,906 INFO api.HttpRobotRulesParser - Whitelisted host > found for: http://localhost/foo/index.html > 2015-04-17 21:58:03,906 INFO api.HttpRobotRulesParser - Ignoring robots.txt > for all URLs from whitelisted host: localhost > {noformat} > > RobotsRuleParser now implements Tool to leverage testing: properties can be > passed via "-Dprop=val", see attached log from test session. > >> Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing >> --------------------------------------------------------------------------- >> >> Key: NUTCH-1927 >> URL: https://issues.apache.org/jira/browse/NUTCH-1927 >> Project: Nutch >> Issue Type: New Feature >> Components: fetcher >> Reporter: Chris A. Mattmann >> Assignee: Chris A. Mattmann >> Labels: available, patch >> Fix For: 1.10 >> >> Attachments: NUTCH-1927.2015-04-16.patch, >> NUTCH-1927.2015-04-17.patch, NUTCH-1927.Mattmann.041115.patch.txt, >> NUTCH-1927.Mattmann.041215.patch.txt, NUTCH-1927.Mattmann.041415.patch.txt, >> test_NUTCH-1927.2015-04-17.txt >> >> >> Based on discussion on the dev list, to use Nutch for some security research >> valid use cases (DDoS; DNS and other testing), I am going to create a patch >> that allows a whitelist: >> {code:xml} >> <property> >> <name>robot.rules.whitelist</name> >> <value>132.54.99.22,hostname.apache.org,foo.jpl.nasa.gov</value> >> <description>Comma separated list of hostnames or IP addresses to ignore >> robot rules parsing for. >> </description> >> </property> >> {code} > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332)

