GitHub user laimis opened a pull request:

    https://github.com/apache/lucenenet/pull/147

    synchronize access to underlying file stream for NIOFSDirectory

    It seems like Lucene.Net NIOFSDirectory port has an issue when used in 
multiple threads for reading. Any test that is based on 
BasePostingsFormatTestCase, which runs multiple threads that read from 
directory, has a lot of errors in the logs when NIOFSDirectory is used as an 
implementation type of FSDirectory.
    
    You can see the failures from the TC build logs, here are some of the 
highlights:
    
    System.Exception: Index was outside the bounds of the array. ---> 
System.IndexOutOfRangeException: Index was outside the bounds of the array.
    at System.IO.FileStream.ReadByte()
    at Lucene.Net.Support.FileStreamExtensions.Read(FileStream file, ByteBuffer 
dst, Int64 position) in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Support\FileStreamExtensions.cs:line
 22
    at Lucene.Net.Store.NIOFSDirectory.NIOFSIndexInput.ReadInternal(Byte[] b, 
Int32 offset, Int32 len) in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\NIOFSDirectory.cs:line
 252
    at Lucene.Net.Store.BufferedIndexInput.Refill() in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexInput.cs:line
 368
    at Lucene.Net.Store.BufferedIndexInput.ReadByte() in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexIn
    
    System.Exception: read past EOF: 
NIOFSIndexInput(path="Z:\Builds\temp\buildTmp\LuceneTemp\testPostingsFormat-1\_0.tis")
 off: 0 len: 543 pos: 24 chunkLen: 543 end: 567 ---> System.Exception: read 
past EOF: 
NIOFSIndexInput(path="Z:\Builds\temp\buildTmp\LuceneTemp\testPostingsFormat-1\_0.tis")
 off: 0 len: 543 pos: 24 chunkLen: 543 end: 567
    at Lucene.Net.Store.NIOFSDirectory.NIOFSIndexInput.ReadInternal(Byte[] b, 
Int32 offset, Int32 len) in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\NIOFSDirectory.cs:line
 256
    at Lucene.Net.Store.BufferedIndexInput.Refill() in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexInput.cs:line
 368
    at Lucene.Net.Store.BufferedIndexInput.ReadByte() in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\BufferedIndexInput.cs:line
 55
    at Lucene.Net.Store.DataInput.ReadVInt() in 
z:\Builds\work\bcdbe6b8cc677a49\src\Lucene.Net.Core\Store\DataInput.cs:line 117
    
    docID is wrong
    Expected: 208378
    But was:  208410
    
    The tests don't fail because the pieces that are being tested pass, but I 
think this is causing failures in other tests when NIOFSDirectory is picked at 
random. After digging more into this, it seems like Lucene implementation 
relied on JRE specific FileChannel construct that is not available in .NET. The 
replacement in FileStreamExtensions is not thread safe because of the 
filestream seek calls it makes. Synchronizing that operation made all the tests 
clean. I reran the whole test suite for core with NIOFSDirectory as the 
Directory implementation and things seem to pass fine.
    
    Not sure what performance implications this has since it looks like the 
purpose of NIOFSDirectory was to provide an optimized version of 
SimpleFSDirectory that worked fast when reading from concurrent threads. As it 
stands, existing implementation is not safe to be used from multiple threads so 
perhaps adding the synchronization is the only option for us.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/laimis/lucenenet 
niofsdirectory_synchronization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucenenet/pull/147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #147
    
----
commit ae1fc2a32b4eb3a347b301c8cb62dfad85acd6db
Author: Laimonas Simutis <[email protected]>
Date:   2015-05-17T16:10:26Z

    synchronize access to underlying file stream for NIOFSDirectory

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to