[ 
https://issues.apache.org/jira/browse/NUTCH-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-1978.
------------------------------------
    Resolution: Duplicate

Hi [~chongli], this is clearly a duplicate of NUTCH-1771. It's better to change 
the affected versions there. Comments/discussion is also moved. Thanks!

> solrindex will fail when indexing corrupted segments
> ----------------------------------------------------
>
>                 Key: NUTCH-1978
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1978
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.10
>            Reporter: Chong Li
>            Priority: Minor
>             Fix For: 1.10
>
>
> The same issue from NUTCH-1771 but seems like this bug will appear in most of 
> the versions since they all don't have the code to handle the corrupted 
> segments.
> Form NUTCH-1771, people pointed out that it will be very hard to handle this 
> in the hadoop layer, and the program should skip the corrupted segments 
> instead of end the program. By corrupted segments I mean that the segment may 
> be just generated and doesn't have the content.
> So my initial idea is to check if the segment folder is valid before putting 
> the segment into the hadoop job. If the segment is not valid, we can simply 
> just skip that segment. We can check if the segment folder contains exactly 6 
> sub directories as there should be. The other  approach will be to check all 
> the six sub directories and see if they are exactly the six dir that should 
> appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to