[
https://issues.apache.org/jira/browse/NUTCH-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394230#comment-14394230
]
Sebastian Nagel commented on NUTCH-1771:
----------------------------------------
Again: nice patch.
* SegmentChecker holds the state of a segment in private fields: why not force
the user to pass segment's Path and FileSystem in the constructor? This would
avoid errors, if the object is re-used and the state is not reset (via
setFlags()). We could also provide a reset(path, fs) method. Alternatively,
make the check function static without caching anything.
* to keep SegementMerger extensible: maybe rename isSegmentValid() to, e.g.,
isIndexable()? We could then add other methods later, to check sanity and
status (generated, fetched, parsed).
> Solrindex fails if a segment is corrupted or incomplete
> -------------------------------------------------------
>
> Key: NUTCH-1771
> URL: https://issues.apache.org/jira/browse/NUTCH-1771
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.8, 1.10
> Reporter: Diaa
> Priority: Minor
> Fix For: 1.11
>
>
> When using solrindex to index multiple segments via -dir segment,
> the indexing fails if one or more segments are corrupted/incomplete
> (generated but not fetched for example)
> The failure is simply java.io exception.
> Deleting the segment fixes the issue.
> The expected behavior should be one of the following:
> * skipping the segment and proceeding with others (while logging)
> * stopping the indexing and logging the failed segment
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)