[
https://issues.apache.org/jira/browse/NUTCH-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201121#comment-14201121
]
Hudson commented on NUTCH-1884:
-------------------------------
SUCCESS: Integrated in Nutch-trunk #2851 (See
[https://builds.apache.org/job/Nutch-trunk/2851/])
NUTCH-1884 NullPointerException in parsechecker and indexchecker with symlinks
in file URL (snagel:
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1637237)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
* /nutch/trunk/src/java/org/apache/nutch/parse/ParserChecker.java
> NullPointerException in parsechecker and indexchecker with symlinks in file
> URL
> -------------------------------------------------------------------------------
>
> Key: NUTCH-1884
> URL: https://issues.apache.org/jira/browse/NUTCH-1884
> Project: Nutch
> Issue Type: Bug
> Components: indexer, parser
> Affects Versions: 1.9
> Environment: Mac OS X 10.9.2
> Apache Maven 2.2.1
> Java version: 1.7.0_51
> Reporter: Mengying Wang
> Priority: Minor
> Fix For: 1.10
>
> Attachments: NUTCH-1884-trunk-v1.patch
>
>
> I have downloaded the Nutch source code from github
> (https://github.com/apache/nutch), applied the patches (NUTCH-1879 and
> NUTCH-1880), and then reinstalled the Nutch. Now the good news is that all
> urls contain only 1 slash. But unfortunately, the
> java.lang.NullPointerException warning/error still exists for both of the
> parsechecker and indexchecker commands.
> Below is the running log:
> (1) $ ./nutch parsechecker
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> signature: 17bdb44990391c96bb8d48d1802ff11c
> Couldn't pass score, url
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> (java.lang.NullPointerException)
> ---------
> Url
> ---------------
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
> ---------
> ParseData
> ---------
> Version: 5
> Status: success(1,0)
> Title: Index of
> /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
> Outlinks: 2
> outlink: toUrl:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
> anchor: ../
> outlink: toUrl:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
> anchor: monitor.xml
> Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue,
> 14 Oct 2014 20:05:50 GMT Content-Type=text/html
> Parse Metadata: CharEncodingForConversion=windows-1252
> OriginalCharEncoding=windows-1252
> (2) $ ./nutch indexchecker
> "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing:
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)