[ 
https://issues.apache.org/jira/browse/GEODE-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018363#comment-17018363
 ] 

ASF subversion and git services commented on GEODE-7703:
--------------------------------------------------------

Commit 9f4d70b849d7f7094fc78e589587bd28be65f122 in geode's branch 
refs/heads/develop from Juan José Ramos
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9f4d70b ]

GEODE-7703: Catch IndexWriter Exceptions (#4597)

The IndexWriter initialization might fail when other threads are
updating the fileAndChunkRegion, which can be triggered by other normal
operations (query, event listener, close, reindex, etc.). This doesn't
happen often and, instead of propagating the exception to the caller
and failing, Geode now catches it and returns null to let the callers
retry.

- Added unit and distrbuted tests.
- Return null instead of re-throwing the IOException while building
  the Lucene IndexWriter.

Co-authored-by: Xiaojian Zhou <[email protected]>

> Lucene IndexWriter Creation Failure 
> ------------------------------------
>
>                 Key: GEODE-7703
>                 URL: https://issues.apache.org/jira/browse/GEODE-7703
>             Project: Geode
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Juan Ramos
>            Assignee: Juan Ramos
>            Priority: Major
>              Labels: GeodeCommons
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> While computing the index repository, the initialization might fail if there 
> are modifications happening to the {{fileAndChunk}} region while the 
> {{IndexWriter}} is being initialized.
>  The exception stack trace varies from run to run but it always involves a 
> {{IOException}} with different causes while reading the index file, some 
> examples are shown below:
> {noformat}
> Caused by: java.io.FileNotFoundException: segments_1
>       at 
> org.apache.geode.cache.lucene.internal.filesystem.FileSystem.getFile(FileSystem.java:101)
>       at 
> org.apache.geode.cache.lucene.internal.directory.RegionDirectory.openInput(RegionDirectory.java:115)
>       at 
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286)
>       at 
> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>       at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:974)
>       at 
> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finishComputingRepository(IndexRepositoryFactory.java:130)
>       at 
> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:67)
>       at 
> org.apache.geode.cache.lucene.internal.IndexRepositoryFactoryDistributedTest.lambda$testBecomePrimaryWhileIndexing$566b4a0f$5(IndexRepositoryFactoryDistributedTest.java:224)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
>       at 
> org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>       at sun.rmi.transport.Transport$1.run(Transport.java:200)
>       at sun.rmi.transport.Transport$1.run(Transport.java:197)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>       at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> {noformat}
> Caused by: java.io.EOFException: Read past end of file _3z.si
>       at 
> org.apache.geode.cache.lucene.internal.directory.FileIndexInput.readByte(FileIndexInput.java:103)
>       at 
> org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
>       at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
>       at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
>       at 
> org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
>       at 
> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:93)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>       ... 20 more
>       Suppressed: org.apache.lucene.index.CorruptIndexException: checksum 
> status indeterminate: unexpected exception 
> (resource=BufferedChecksumIndexInput(_3z.si))
>               at 
> org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:471)
>               at 
> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:252)
>               ... 22 more
>       Caused by: java.io.EOFException: Read past end of file _3z.si
>               at 
> org.apache.geode.cache.lucene.internal.directory.FileIndexInput.readBytes(FileIndexInput.java:124)
>               at 
> org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:49)
>               at 
> org.apache.lucene.store.DataInput.readBytes(DataInput.java:87)
>               at 
> org.apache.lucene.store.DataInput.skipBytes(DataInput.java:350)
>               at 
> org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:458)
>               ... 23 more
> {noformat}
> The issue itself is extremely hard to reproduce as the time window for the 
> race to happen is rather small, the solution implies returning null from the 
> {{IndexRepositoryFactory}} whenever the exception happens and let the caller 
> retry (the internal logic for doing this is already in place).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to