[
https://issues.apache.org/jira/browse/NUTCH-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1978.
------------------------------------
Resolution: Duplicate
Hi [~chongli], this is clearly a duplicate of NUTCH-1771. It's better to change
the affected versions there. Comments/discussion is also moved. Thanks!
> solrindex will fail when indexing corrupted segments
> ----------------------------------------------------
>
> Key: NUTCH-1978
> URL: https://issues.apache.org/jira/browse/NUTCH-1978
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.10
> Reporter: Chong Li
> Priority: Minor
> Fix For: 1.10
>
>
> The same issue from NUTCH-1771 but seems like this bug will appear in most of
> the versions since they all don't have the code to handle the corrupted
> segments.
> Form NUTCH-1771, people pointed out that it will be very hard to handle this
> in the hadoop layer, and the program should skip the corrupted segments
> instead of end the program. By corrupted segments I mean that the segment may
> be just generated and doesn't have the content.
> So my initial idea is to check if the segment folder is valid before putting
> the segment into the hadoop job. If the segment is not valid, we can simply
> just skip that segment. We can check if the segment folder contains exactly 6
> sub directories as there should be. The other approach will be to check all
> the six sub directories and see if they are exactly the six dir that should
> appear.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)