[ 
https://issues.apache.org/jira/browse/NUTCH-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633615#action_12633615
 ] 

Hudson commented on NUTCH-633:
------------------------------

Integrated in Nutch-trunk #580 (See 
[http://hudson.zones.apache.org/hudson/job/Nutch-trunk/580/])

> ParseSegment no longer allow reparsing
> --------------------------------------
>
>                 Key: NUTCH-633
>                 URL: https://issues.apache.org/jira/browse/NUTCH-633
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>         Environment: any
>            Reporter: Xue Yong Zhi
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: NUTCH_633.patch
>
>
> ParseSegment used to allow reparsing even if parsing has been enabled in 
> Fetcher. But now it throws a NumberFormatException as 
> 'content.getMetadata().get(Nutch.FETCH_STATUS_KEY)' is null.
> This patch will fix the problem:
> --- a/src/java/org/apache/nutch/parse/ParseSegment.java
> +++ b/src/java/org/apache/nutch/parse/ParseSegment.java
> @@ -70,8 +70,10 @@ public class ParseSegment extends Configured implements 
> Tool, Mapper<WritableCom
>        key = newKey;
>      }
>      
> +    //status_key is only available when parsing is not done in fetcher
> +    String status_key = content.getMetadata().get(Nutch.FETCH_STATUS_KEY);
>      int status =
> -      Integer.parseInt(content.getMetadata().get(Nutch.FETCH_STATUS_KEY));
> +      (null == status_key) ? CrawlDatum.STATUS_FETCH_SUCCESS : 
> Integer.parseInt(status_key);
>      if (status != CrawlDatum.STATUS_FETCH_SUCCESS) {
>        // content not fetched successfully, skip document
>        LOG.debug("Skipping " + key + " as content is not fetched 
> successfully");

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to