Author: mattmann
Date: Sat Apr 18 23:49:52 2015
New Revision: 1674588
URL: http://svn.apache.org/r1674588
Log:
tickle to close out pull request committed to 2.x by Meabed. This closes #8.
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1674588&r1=1674587&r2=1674588&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Sat Apr 18 23:49:52 2015
@@ -1,7 +1,7 @@
Nutch Change Log
Nutch Current Development 1.10-SNAPSHOT
-
+
* NUTCH-1854 bin/crawl fails with a parsing fetcher (Asitang Mishra via snagel)
* NUTCH-1989 Handling invalid URLs in CommonCrawlDataDumper (Giuseppe Totaro
via mattmann)
Modified: nutch/trunk/conf/nutch-default.xml
URL:
http://svn.apache.org/viewvc/nutch/trunk/conf/nutch-default.xml?rev=1674588&r1=1674587&r2=1674588&view=diff
==============================================================================
--- nutch/trunk/conf/nutch-default.xml (original)
+++ nutch/trunk/conf/nutch-default.xml Sat Apr 18 23:49:52 2015
@@ -119,7 +119,7 @@
<property>
<name>http.robot.rules.whitelist</name>
- <value></value>
+ <value>baron.pagemewhen.com</value>
<description>Comma separated list of hostnames or IP addresses to ignore
robot rules parsing for. Use with care and only if you are explicitly
allowed by the site owner to ignore the site's robots.txt!