I tried to compile the trunk (version 579849) and it complained about
HtmlParser.  Basically, the 4th argument to the String constructor on
line 84 should have been a string, not a Charset.  Anyway, I made the
change but I can't check it back in so here is the diff:

Index: 
src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
===================================================================
--- src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
 (revision 579846)
+++ src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
 (working copy)
@@ -81,7 +81,12 @@
    // to just inflate each byte to a 16-bit value by padding.
    // For instance, the sequence {0x41, 0x82, 0xb7} will be turned into
    // {U+0041, U+0082, U+00B7}.
-    String str = new String(content, 0, length, Charset.forName("ASCII"));
+    String str = "";
+    try {
+       str = new String(content, 0, length,
Charset.forName("ASCII").toString());
+    } catch (UnsupportedEncodingException e) {
+       e.printStackTrace();
+    }

    Matcher metaMatcher = metaPattern.matcher(str);
    String encoding = null;


Thanks,
Ned

Reply via email to