[ https://issues.apache.org/jira/browse/NUTCH-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633615#action_12633615 ]
Hudson commented on NUTCH-633: ------------------------------ Integrated in Nutch-trunk #580 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/580/]) > ParseSegment no longer allow reparsing > -------------------------------------- > > Key: NUTCH-633 > URL: https://issues.apache.org/jira/browse/NUTCH-633 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.0.0 > Environment: any > Reporter: Xue Yong Zhi > Assignee: Doğacan Güney > Priority: Minor > Fix For: 1.0.0 > > Attachments: NUTCH_633.patch > > > ParseSegment used to allow reparsing even if parsing has been enabled in > Fetcher. But now it throws a NumberFormatException as > 'content.getMetadata().get(Nutch.FETCH_STATUS_KEY)' is null. > This patch will fix the problem: > --- a/src/java/org/apache/nutch/parse/ParseSegment.java > +++ b/src/java/org/apache/nutch/parse/ParseSegment.java > @@ -70,8 +70,10 @@ public class ParseSegment extends Configured implements > Tool, Mapper<WritableCom > key = newKey; > } > > + //status_key is only available when parsing is not done in fetcher > + String status_key = content.getMetadata().get(Nutch.FETCH_STATUS_KEY); > int status = > - Integer.parseInt(content.getMetadata().get(Nutch.FETCH_STATUS_KEY)); > + (null == status_key) ? CrawlDatum.STATUS_FETCH_SUCCESS : > Integer.parseInt(status_key); > if (status != CrawlDatum.STATUS_FETCH_SUCCESS) { > // content not fetched successfully, skip document > LOG.debug("Skipping " + key + " as content is not fetched > successfully"); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.