[ 
https://issues.apache.org/jira/browse/SOLR-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276897#comment-15276897
 ] 

Uwe Schindler edited comment on SOLR-9091 at 5/9/16 8:01 PM:
-------------------------------------------------------------

Every segment in an index also had a unique identifier written into the file. 
So you can compare both, the checksum for correctness of file (not modified) 
and the uuid to validate if it is really the same segment: 
CodecUtil.checkIndexHeader validated the ID in the header of each file. Just 
make sure that they are identical.

See slides of talk about Lucene 5 last year: 
https://berlinbuzzwords.de/session/apache-lucene-5-new-features-and-improvements-apache-solr-and-elasticsearch,
 PDF file page 30 and 31.
Both the checksum and the unique segment ID are made exactly for the 
replication backup case.

No need to do any additional checks + time-I/O-intensive operations. Just 
compare 2 identifiers and you know the files are from same index, same segment 
and have same data contents. When you transfer them you can recalculate the CRC 
checksum after transfer to ensure the transfer was successful.


was (Author: thetaphi):
Every segment in an index also had a unique identifier written into the file. 
So you can compare both, the checksum for correctness of file (not modified) 
and the uuid to validate if it is really the same segment: 
CodecUtil.checkIndexHeader validated the ID in the header of each file. Just 
make sure that they are identical.

> Solr index restore silently copies the corrupt segments in the backup
> ---------------------------------------------------------------------
>
>                 Key: SOLR-9091
>                 URL: https://issues.apache.org/jira/browse/SOLR-9091
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hrishikesh Gadre
>
> The Solr core restore functionality uses following criteria to decide if a 
> given file is copied from backup directory or from current index directory.
> case 1] File is available in both backup and current index directory
> --> Compare the checksum and file length
>   --> If checksum and length matching, copy the file from current working 
> directory.
>  --> If the checksum and length doesn't match, copy the file from backup 
> directory. 
> case 2] File is available in only in backup directory (This can happen for a 
> newly created core without any data).
> --> Copy the file from backup directory. 
> Now the problem here is that we intentionally catch and ignore the error 
> while reading the checksum for a file in the backup directory. Hence in case 
> (2), it will result into restoration of a file without appropriate "checksum".
> Here is the relevant code snippet,
> https://github.com/apache/lucene-solr/blob/a5586d29b23f7d032e6d8f0cf8758e56b09e0208/solr/core/src/java/org/apache/solr/handler/RestoreCore.java#L82-L95



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to