[ 
https://issues.apache.org/jira/browse/NUTCH-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394230#comment-14394230
 ] 

Sebastian Nagel commented on NUTCH-1771:
----------------------------------------

Again: nice patch.
* SegmentChecker holds the state of a segment in private fields: why not force 
the user to pass segment's Path and FileSystem in the constructor? This would 
avoid errors, if the object is re-used and the state is not reset (via 
setFlags()). We could also provide a reset(path, fs) method. Alternatively, 
make the check function static without caching anything.
* to keep SegementMerger extensible: maybe rename isSegmentValid() to, e.g., 
isIndexable()? We could then add other methods later, to check sanity and 
status (generated, fetched, parsed).

> Solrindex fails if a segment is corrupted or incomplete
> -------------------------------------------------------
>
>                 Key: NUTCH-1771
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1771
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.8, 1.10
>            Reporter: Diaa
>            Priority: Minor
>             Fix For: 1.11
>
>
> When using solrindex to index multiple segments via -dir segment,
> the indexing fails if one or more segments are corrupted/incomplete 
> (generated but not fetched for example)
> The failure is simply java.io exception.
> Deleting the segment fixes the issue.
> The expected behavior should be one of the following:
> * skipping the segment and proceeding with others (while logging)
> * stopping the indexing and logging the failed segment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to