[ 
https://issues.apache.org/jira/browse/NUTCH-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191369#comment-14191369
 ] 

Mengying Wang edited comment on NUTCH-1884 at 10/31/14 5:57 AM:
----------------------------------------------------------------

It turns out that this is not a bug. Actually, 
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ is 
not a "real" path. i.e., symbolic links exist in the path. Since Parse objects 
results are stored by "real" path in the ParseResult which may cause a NPE, 
when there is no ParseResult available per original path. So just use the 
original real path as the url, then NPE disappears.


was (Author: angela_wang):
Please make sure 
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ is 
a "real" path. i.e., no symbolic links in the path. Because Parse objects 
results are stored by "real" path in the ParseResult which may cause a NPE, 
when there is no ParseResult available per original path.

> Java.lang.NullPointerException when using the parsechecker and indexchecker
> ---------------------------------------------------------------------------
>
>                 Key: NUTCH-1884
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1884
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer, parser
>    Affects Versions: 1.9
>         Environment: Mac OS X 10.9.2
> Apache Maven 2.2.1
> Java version: 1.7.0_51
>            Reporter: Mengying Wang
>            Priority: Minor
>
> I have downloaded the Nutch source code from github 
> (https://github.com/apache/nutch), applied the patches (NUTCH-1879 and 
> NUTCH-1880), and then reinstalled the Nutch.  Now the good news is that all 
> urls contain only 1 slash. But unfortunately, the 
> java.lang.NullPointerException warning/error still exists for both of the 
> parsechecker and indexchecker commands.
> Below is the running log:
> (1) $ ./nutch parsechecker 
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> signature: 17bdb44990391c96bb8d48d1802ff11c
> Couldn't pass score, url 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
>  (java.lang.NullPointerException)
> ---------
> Url
> ---------------
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
> ---------
> ParseData
> ---------
> Version: 5
> Status: success(1,0)
> Title: Index of 
> /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
> Outlinks: 2
>   outlink: toUrl: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
>  anchor: ../
>   outlink: toUrl: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
>  anchor: monitor.xml
> Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, 
> 14 Oct 2014 20:05:50 GMT Content-Type=text/html 
> Parse Metadata: CharEncodingForConversion=windows-1252 
> OriginalCharEncoding=windows-1252 
> (2) $ ./nutch indexchecker 
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: 
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> Exception in thread "main" java.lang.NullPointerException
>       at 
> org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at 
> org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to