GitHub user laimis opened a pull request:
https://github.com/apache/lucenenet/pull/147
synchronize access to underlying file stream for NIOFSDirectory
It seems like Lucene.Net NIOFSDirectory port has an issue when used in
multiple threads for reading. Any test that is based on
BasePostingsFormatTestCase, which runs multiple threads that read from
directory, has a lot of errors in the logs when NIOFSDirectory is used as an
implementation type of FSDirectory.
You can see the failures from the TC build logs, here are some of the
highlights:
System.Exception: Index was outside the bounds of the array. --->
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.IO.FileStream.ReadByte()
at Lucene.Net.Support.FileStreamExtensions.Read(FileStream file, ByteBuffer
dst, Int64 position) in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Support\FileStreamExtensions.cs:line
22
at Lucene.Net.Store.NIOFSDirectory.NIOFSIndexInput.ReadInternal(Byte[] b,
Int32 offset, Int32 len) in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\NIOFSDirectory.cs:line
252
at Lucene.Net.Store.BufferedIndexInput.Refill() in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexInput.cs:line
368
at Lucene.Net.Store.BufferedIndexInput.ReadByte() in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexIn
System.Exception: read past EOF:
NIOFSIndexInput(path="Z:\Builds\temp\buildTmp\LuceneTemp\testPostingsFormat-1\_0.tis")
off: 0 len: 543 pos: 24 chunkLen: 543 end: 567 ---> System.Exception: read
past EOF:
NIOFSIndexInput(path="Z:\Builds\temp\buildTmp\LuceneTemp\testPostingsFormat-1\_0.tis")
off: 0 len: 543 pos: 24 chunkLen: 543 end: 567
at Lucene.Net.Store.NIOFSDirectory.NIOFSIndexInput.ReadInternal(Byte[] b,
Int32 offset, Int32 len) in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\NIOFSDirectory.cs:line
256
at Lucene.Net.Store.BufferedIndexInput.Refill() in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexInput.cs:line
368
at Lucene.Net.Store.BufferedIndexInput.ReadByte() in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexInput.cs:line
55
at Lucene.Net.Store.DataInput.ReadVInt() in
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\DataInput.cs:line 117
docID is wrong
Expected: 208378
But was: 208410
The tests don't fail because the pieces that are being tested pass, but I
think this is causing failures in other tests when NIOFSDirectory is picked at
random. After digging more into this, it seems like Lucene implementation
relied on JRE specific FileChannel construct that is not available in .NET. The
replacement in FileStreamExtensions is not thread safe because of the
filestream seek calls it makes. Synchronizing that operation made all the tests
clean. I reran the whole test suite for core with NIOFSDirectory as the
Directory implementation and things seem to pass fine.
Not sure what performance implications this has since it looks like the
purpose of NIOFSDirectory was to provide an optimized version of
SimpleFSDirectory that worked fast when reading from concurrent threads. As it
stands, existing implementation is not safe to be used from multiple threads so
perhaps adding the synchronization is the only option for us.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/laimis/lucenenet
niofsdirectory_synchronization
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/lucenenet/pull/147.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #147
----
commit ae1fc2a32b4eb3a347b301c8cb62dfad85acd6db
Author: Laimonas Simutis <[email protected]>
Date: 2015-05-17T16:10:26Z
synchronize access to underlying file stream for NIOFSDirectory
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---