[ https://issues.apache.org/jira/browse/NUTCH-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-1884: ----------------------------------- Affects Version/s: (was: 2.2.1) > NullPointerException in parsechecker and indexchecker with symlinks in file > URL > ------------------------------------------------------------------------------- > > Key: NUTCH-1884 > URL: https://issues.apache.org/jira/browse/NUTCH-1884 > Project: Nutch > Issue Type: Bug > Components: indexer, parser > Affects Versions: 1.9 > Environment: Mac OS X 10.9.2 > Apache Maven 2.2.1 > Java version: 1.7.0_51 > Reporter: Mengying Wang > Priority: Minor > Fix For: 2.4, 1.10 > > Attachments: NUTCH-1884-trunk-v1.patch > > > I have downloaded the Nutch source code from github > (https://github.com/apache/nutch), applied the patches (NUTCH-1879 and > NUTCH-1880), and then reinstalled the Nutch. Now the good news is that all > urls contain only 1 slash. But unfortunately, the > java.lang.NullPointerException warning/error still exists for both of the > parsechecker and indexchecker commands. > Below is the running log: > (1) $ ./nutch parsechecker > "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/" > fetching: > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ > parsing: > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ > contentType: text/html > signature: 17bdb44990391c96bb8d48d1802ff11c > Couldn't pass score, url > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ > (java.lang.NullPointerException) > --------- > Url > --------------- > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/ > --------- > ParseData > --------- > Version: 5 > Status: success(1,0) > Title: Index of > /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml > Outlinks: 2 > outlink: toUrl: > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/ > anchor: ../ > outlink: toUrl: > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml > anchor: monitor.xml > Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, > 14 Oct 2014 20:05:50 GMT Content-Type=text/html > Parse Metadata: CharEncodingForConversion=windows-1252 > OriginalCharEncoding=windows-1252 > (2) $ ./nutch indexchecker > "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/" > fetching: > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ > parsing: > file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/ > contentType: text/html > Exception in thread "main" java.lang.NullPointerException > at > org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177) -- This message was sent by Atlassian JIRA (v6.3.4#6332)