Author: snagel
Date: Wed Oct 10 21:16:09 2012
New Revision: 1396801

URL: http://svn.apache.org/viewvc?rev=1396801&view=rev
Log:
NUTCH-1344 BasicURLNormalizer to normalize https same as http

Modified:
    nutch/trunk/CHANGES.txt
    
nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java

Modified: nutch/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1396801&r1=1396800&r2=1396801&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Wed Oct 10 21:16:09 2012
@@ -2,6 +2,8 @@ Nutch Change Log
 
 (trunk) Current Development:
 
+* NUTCH-1344 BasicURLNormalizer to normalize https same as http
+
 * NUTCH-706 Url regex normalizer: pattern for session id removal not to match 
"newsId" (Meghna Kukreja via snagel)
 
 * NUTCH-1415 release packages to contain top level folder apache-nutch-x.x 
(snagel)

Modified: 
nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
URL: 
http://svn.apache.org/viewvc/nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java?rev=1396801&r1=1396800&r2=1396801&view=diff
==============================================================================
--- 
nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
 (original)
+++ 
nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
 Wed Oct 10 21:16:09 2012
@@ -104,7 +104,7 @@ public class BasicURLNormalizer extends 
         if (!urlString.startsWith(protocol))        // protocol was lowercased
             changed = true;
 
-        if ("http".equals(protocol) || "ftp".equals(protocol)) {
+        if ("http".equals(protocol) || "https".equals(protocol) || 
"ftp".equals(protocol)) {
 
             if (host != null) {
                 String newHost = host.toLowerCase();    // lowercase host


Reply via email to