Hi,
thanks for testing!
1. is
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
the "real" path. I.e., are there no symbolic links in the path?
The command
readlink -f
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
should show you whether this is the case or not.
Because Parse objects results are stored by "real" path in the ParseResult
this may cause a NPE, when there is no ParseResult available per original
path.
2. unhappily, the log output is ambiguous. there are two places in
ParserChecker where
exceptions are catched with the same log message.
Can you apply the attached patch and test again? Just to get more verbose
log messages.
If you have time, please, open a Jira to improve the logging in this case.
Thanks,
Sebastian
On 10/26/2014 02:24 AM, Mengying Wang wrote:
> Hi Sebastian,
>
> I have downloaded the Nutch source code from github
> (https://github.com/apache/nutch), applied the
> patches (NUTCH-1879 and NUTCH-1880), and then reinstalled the Nutch. Now the
> good news is that all
> urls contain only 1 slash. But unfortunately, java.lang.NullPointerException
> warning/error occurs
> for both of the parsechecker and indexchecker commands.
>
> Below is the running log:
>
> $ ./nutch parsechecker
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> signature: 17bdb44990391c96bb8d48d1802ff11c
> Couldn't pass score, url
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> (java.lang.NullPointerException)
> ---------
> Url
> ---------------
>
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
> ---------
> ParseData
> ---------
>
> Version: 5
> Status: success(1,0)
> Title: Index of
> /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
> Outlinks: 2
> outlink: toUrl:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
> anchor: ../
> outlink: toUrl:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
> anchor: monitor.xml
> Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue,
> 14 Oct 2014 20:05:50
> GMT Content-Type=text/html
> Parse Metadata: CharEncodingForConversion=windows-1252
> OriginalCharEncoding=windows-1252
>
>
> $ ./nutch indexchecker
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)
>
> Thanks.
> Mengying (Angela) Wang
diff --git src/java/org/apache/nutch/parse/ParserChecker.java src/java/org/apache/nutch/parse/ParserChecker.java
index 083af2d..0e13d61 100644
--- src/java/org/apache/nutch/parse/ParserChecker.java
+++ src/java/org/apache/nutch/parse/ParserChecker.java
@@ -24,6 +24,7 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
+import org.apache.hadoop.util.StringUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.nutch.crawl.CrawlDatum;
@@ -164,7 +165,8 @@ public class ParserChecker implements Tool {
scfilters.passScoreBeforeParsing(turl, cd, content);
} catch (Exception e) {
if (LOG.isWarnEnabled()) {
- LOG.warn("Couldn't pass score, url " + turl.toString() + " (" + e + ")");
+ LOG.warn("Couldn't pass score before parsing, url " + turl + " (" + e + ")");
+ LOG.warn(StringUtils.stringifyException(e));
}
}
@@ -189,7 +191,8 @@ public class ParserChecker implements Tool {
scfilters.passScoreAfterParsing(turl, content, parseResult.get(turl));
} catch (Exception e) {
if (LOG.isWarnEnabled()) {
- LOG.warn("Couldn't pass score, url " + turl + " (" + e + ")");
+ LOG.warn("Couldn't pass score after parsing, url " + turl + " (" + e + ")");
+ LOG.warn(StringUtils.stringifyException(e));
}
}