[ 
https://issues.apache.org/jira/browse/LUCENE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-6287:
---------------------------------------
    Attachment: LUCENE-6287.patch

Patch w/ a simple fix ... I'm beasting the test and so far so good ... I'll 
leave it running.

IW already holds an incRef'd set of files that are in-flight for commit, so I 
just fixed it to re-compute that set after SIS.prepareCommit (which may write 
the .si/marker files) and incRef the new set with IFD.  This protects them 
while the commit runs, and then when the commit finishes we incRef them with 
IFD again and they are permanent after that.


> Corrupt index (missing .si file) on first 4.x commit to a 3.x index
> -------------------------------------------------------------------
>
>                 Key: LUCENE-6287
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6287
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Blocker
>             Fix For: 4.10.4
>
>         Attachments: LUCENE-6287.patch, LUCENE-6287.patch
>
>
> If you have a 3.x index, and you open it with a 4.x IndexWriter for
> the first time, and you do something that kicks of merges while
> concurrently committing, it's possible the index will corrupt itself
> with exceptions like this:
> {noformat}
> java.nio.file.NoSuchFileException: 
> /l/tmp/reruns.TestBackwardsCompatibility3x.testMergeDuringUpgrade.t2/lucene.index.TestBackwardsCompatibility3x-71F31CCCEF6853A-001/manysegments.362-006/_0.si
>       at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>       at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>       at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>       at java.nio.channels.FileChannel.open(FileChannel.java:287)
>       at java.nio.channels.FileChannel.open(FileChannel.java:334)
>       at 
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
>       at 
> org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
>       at 
> org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.read(Lucene3xSegmentInfoReader.java:106)
>       at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:358)
>       at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454)
>       at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906)
>       at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
>       at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:457)
>       at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:414)
>       at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:207)
>       at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:196)
>       at 
> org.apache.lucene.store.BaseDirectoryWrapper.close(BaseDirectoryWrapper.java:45)
>       at 
> org.apache.lucene.index.TestBackwardsCompatibility3x.testMergeDuringUpgrade(TestBackwardsCompatibility3x.java:1035)
> {noformat}
> Back compat tests in Elasticsearch hit this, and at first I thought maybe 
> LUCENE-6279 was the cause (I still think we should fix that) but after 
> further debugging there is a different concurrency bug lurking here.
> I have a test case which after substantial beasting is able to reproduce the 
> bug, but I don't yet have a fix.  I think IW is missing a checkpoint after 
> writing a new commit...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to