Hi all,


We're occasionally observing corrupted indexes in production, on Windows 
Server. We tracked it down to the way NTFS behaves in case of partial writes.



When the disk or the machine fail during a flush, it's possible on NTFS that 
the file being written to has already been extended to the new length, but the 
content is not visible yet. For security reasons NTFS will return all 0s for 
content when reading past the last successfully written point after the system 
restarts.



Lucene's commit code relies on committing an updated .gen file as the last step 
of index flush/update. In this case, the file is there, but contains 0s, making 
it unreadable for Lucene. Failures at this point leave the index in a state 
that's not readable.



We think that the safest approach, which is robust to reordered writes, is to 
consider a gen file with all zeroes the same as a non-existing gen file. This 
assumes that by the time the gen file is fsync'ed all other files have been 
flushed to disk explicitly. If that's not the case, then there's still exposure 
to reordered writes.



I don't have a repro at this point. Before digging deeper into this I wanted to 
see what the Lucene devs think. Does the proposed fix make sense? Any ideas on 
how to set up a reproducible test for this issue?



We verified this on Elasticsearch 1.7.1 which uses Lucene 4.10.4. Are there 
significant changes to this area in newer Lucene versions?



// Thomas

Reply via email to