[
https://issues.apache.org/jira/browse/NUTCH-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392138#comment-14392138
]
Jorge Luis Betancourt Gonzalez commented on NUTCH-1771:
-------------------------------------------------------
+1 for this patch and for [~wastl-nagel], moving to a new class will allow to
write a little "segment checker" if the crawl process is stopped due to a hard
reboot, for instance, this tool could help locate the problematic segment
before starting the crawling process again.
> Solrindex fails if a segment is corrupted or incomplete
> -------------------------------------------------------
>
> Key: NUTCH-1771
> URL: https://issues.apache.org/jira/browse/NUTCH-1771
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.8, 1.10
> Reporter: Diaa
> Priority: Minor
> Fix For: 1.11
>
>
> When using solrindex to index multiple segments via -dir segment,
> the indexing fails if one or more segments are corrupted/incomplete
> (generated but not fetched for example)
> The failure is simply java.io exception.
> Deleting the segment fixes the issue.
> The expected behavior should be one of the following:
> * skipping the segment and proceeding with others (while logging)
> * stopping the indexing and logging the failed segment
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)