Author: lewismc
Date: Mon May 21 18:25:09 2012
New Revision: 1341137
URL: http://svn.apache.org/viewvc?rev=1341137&view=rev
Log:
commit to address NUTCH-1364 and update to CHANGES.txt
Modified:
nutch/branches/nutchgora/CHANGES.txt
nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java
Modified: nutch/branches/nutchgora/CHANGES.txt
URL:
http://svn.apache.org/viewvc/nutch/branches/nutchgora/CHANGES.txt?rev=1341137&r1=1341136&r2=1341137&view=diff
==============================================================================
--- nutch/branches/nutchgora/CHANGES.txt (original)
+++ nutch/branches/nutchgora/CHANGES.txt Mon May 21 18:25:09 2012
@@ -2,6 +2,8 @@ Nutch Change Log
Release nutchgora - Current Development
+* NUTCH-1364 Add a counter for malformed urls (Jason Trost via lewismc)
+
* NUTCH-1361 Fix mishandling of malformed urls in generator job (Jason Trost
via lewismc)
* NUTCH-1360 Support the storing of IP address connected to when web crawling
(lewismc)
Modified:
nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java
URL:
http://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java?rev=1341137&r1=1341136&r2=1341137&view=diff
==============================================================================
---
nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java
(original)
+++
nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java
Mon May 21 18:25:09 2012
@@ -77,6 +77,7 @@ extends GoraReducer<SelectorEntry, WebPa
try {
context.write(TableUtil.reverseUrl(key.url), page);
} catch (MalformedURLException e) {
+ context.getCounter("Generator", "MALFORMED_URL").increment(1);
continue;
}
context.getCounter("Generator", "GENERATE_MARK").increment(1);