Hi Sebastian,

I have downloaded the Nutch source code from github (
https://github.com/apache/nutch), applied the patches (NUTCH-1879 and
NUTCH-1880), and then reinstalled the Nutch.  Now the good news is that all
urls contain only 1 slash. But unfortunately,
 java.lang.NullPointerException warning/error occurs for both of the
parsechecker and indexchecker commands.

Below is the running log:

$ ./nutch parsechecker
"file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
fetching:
file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
parsing:
file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
contentType: text/html
signature: 17bdb44990391c96bb8d48d1802ff11c
Couldn't pass score, url
file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
(java.lang.NullPointerException)
---------
Url
---------------

file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
---------
ParseData
---------

Version: 5
Status: success(1,0)
Title: Index of
/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
Outlinks: 2
  outlink: toUrl:
file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
anchor: ../
  outlink: toUrl:
file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
anchor: monitor.xml
Content Metadata: Content-Length=352 nutch.crawl.score=0.0
Last-Modified=Tue, 14 Oct 2014 20:05:50 GMT Content-Type=text/html
Parse Metadata: CharEncodingForConversion=windows-1252
OriginalCharEncoding=windows-1252


$ ./nutch indexchecker
"file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
fetching:
file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
parsing:
file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
contentType: text/html
Exception in thread "main" java.lang.NullPointerException
at
org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)

Thanks.
Mengying (Angela) Wang

Reply via email to