[ 
https://issues.apache.org/jira/browse/NUTCH-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059387#comment-13059387
 ] 

Markus Jelsma commented on NUTCH-993:
-------------------------------------

There's an issue with ParseOutputformat. It fails when running Nutch locally:

{code}
ParseSegment: segment: crawl/segments/20110704125233
Exception in thread "main" java.io.IOException: Segment already fetched!
        at 
org.apache.nutch.parse.ParseOutputFormat.checkOutputSpecs(ParseOutputFormat.java:86)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
        at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
        at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:178)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:164)

{code}

> NullPointerException at FetcherOutputFormat.checkOutputSpecs
> ------------------------------------------------------------
>
>                 Key: NUTCH-993
>                 URL: https://issues.apache.org/jira/browse/NUTCH-993
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>         Environment: Cloudera CDH3 Cluster (hadoop 0.20.2-cdh3u0)
>            Reporter: Christian Guegi
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4, 2.0
>
>         Attachments: FetcherOutputFormat.patch, ParseOutputFormat.patch
>
>
> When running Nutch as a mapreduce job on an existing cluster I get an 
> NullPointerException at 
> org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs.
> The reason is that the passed in reference to the file system is null.
> The attached patch ignores the parameter 'fs' and creates a new reference to 
> the file system.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to