Hi,

thanks for testing!

1. is 
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
   the "real" path. I.e., are there no symbolic links in the path?
   The command
     readlink -f 
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
   should show you whether this is the case or not.
   Because Parse objects results are stored by "real" path in the ParseResult
   this may cause a NPE, when there is no ParseResult available per original 
path.

2. unhappily, the log output is ambiguous. there are two places in 
ParserChecker where
   exceptions are catched with the same log message.
   Can you apply the attached patch and test again? Just to get more verbose 
log messages.
   If you have time, please, open a Jira to improve the logging in this case.

Thanks,
Sebastian

On 10/26/2014 02:24 AM, Mengying Wang wrote:
> Hi Sebastian,
> 
> I have downloaded the Nutch source code from github 
> (https://github.com/apache/nutch), applied the
> patches (NUTCH-1879 and NUTCH-1880), and then reinstalled the Nutch.  Now the 
> good news is that all
> urls contain only 1 slash. But unfortunately,  java.lang.NullPointerException 
> warning/error occurs
> for both of the parsechecker and indexchecker commands.
> 
> Below is the running log:
> 
> $ ./nutch parsechecker
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> signature: 17bdb44990391c96bb8d48d1802ff11c
> Couldn't pass score, url
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> (java.lang.NullPointerException)
> ---------
> Url
> ---------------
> 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
> ---------
> ParseData
> ---------
> 
> Version: 5
> Status: success(1,0)
> Title: Index of 
> /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
> Outlinks: 2
>   outlink: toUrl: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
> anchor: ../
>   outlink: toUrl:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
> anchor: monitor.xml
> Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, 
> 14 Oct 2014 20:05:50
> GMT Content-Type=text/html 
> Parse Metadata: CharEncodingForConversion=windows-1252 
> OriginalCharEncoding=windows-1252 
> 
> 
> $ ./nutch indexchecker
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at 
> org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)
> 
> Thanks.
> Mengying (Angela) Wang

diff --git src/java/org/apache/nutch/parse/ParserChecker.java src/java/org/apache/nutch/parse/ParserChecker.java
index 083af2d..0e13d61 100644
--- src/java/org/apache/nutch/parse/ParserChecker.java
+++ src/java/org/apache/nutch/parse/ParserChecker.java
@@ -24,6 +24,7 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.Text;
+import org.apache.hadoop.util.StringUtils;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;
 import org.apache.nutch.crawl.CrawlDatum;
@@ -164,7 +165,8 @@ public class ParserChecker implements Tool {
       scfilters.passScoreBeforeParsing(turl, cd, content);
     } catch (Exception e) {
       if (LOG.isWarnEnabled()) {
-        LOG.warn("Couldn't pass score, url " + turl.toString() + " (" + e + ")");
+        LOG.warn("Couldn't pass score before parsing, url " + turl + " (" + e + ")");
+        LOG.warn(StringUtils.stringifyException(e));
       }
     }    
     
@@ -189,7 +191,8 @@ public class ParserChecker implements Tool {
       scfilters.passScoreAfterParsing(turl, content, parseResult.get(turl));
     } catch (Exception e) {
       if (LOG.isWarnEnabled()) {
-        LOG.warn("Couldn't pass score, url " + turl + " (" + e + ")");
+        LOG.warn("Couldn't pass score after parsing, url " + turl + " (" + e + ")");
+        LOG.warn(StringUtils.stringifyException(e));
       }
     }
 

Reply via email to