Vincent, If you have the time, I'd appreciate your assistance with a fix for a long-standing concurrency bug. I have been putting together wrapper console application for the various utilities that ship with Lucene and discovered that 2 of them are non-functional because of this bug, but on the upside is now there is a reliable way to reproduce it. I suspect the bug is also causing some of the random test failures that we are seeing on certain FSDirectory implementations.
I have pushed the WIP application to my local repository (https://github.com/NightOwl888/lucenenet/tree/cli/src/tools/lucene-cli). It only runs on .NET Core and in Visual Studio 2015 Update 3. I don't think it makes sense to support .NET framework for this utility since .NET Core will run side-by-side with .NET Framework anyway. You can run a specific commands directly on the command line or in Visual Studio 2015. There is a server that needs to be started first, and then a client that connects. The problem seems to be the server. Command Line dotnet lucene-cli.dll lock verify-server 127.0.0.4 10 dotnet lucene-cli.dll lock stress-test 3 127.0.0.4 <THE_PORT> NativeFSLockFactory F:\temp2 50 10 Note the port is dynamically chosen by the server at runtime and displayed on the console. Visual Studio 2015 In Visual Studio 2015, you can just copy everything after "dotnet lucene-cli.dll" and paste it into the project properties > Debug > Application Arguments text box. Do note I am not sure if those options are optimal (or even if they may be causing the issue). What I Have Found When the client calls the server, the server locks on LockVerifyServer.cs line 129 (https://github.com/NightOwl888/lucenenet/blob/cli/src/Lucene.Net/Store/LockVerifyServer.cs#L129). I tried removing that line, and it gets a bit further and then crashes with this error: An unhandled exception of type 'System.Exception' occurred in System.Private.CoreLib.ni.dll Additional information: System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace --- at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.IO.Stream.ReadByte() at System.IO.BinaryReader.InternalReadOneChar() at Lucene.Net.Store.LockVerifyServer.ThreadAnonymousInnerClassHelper.Run() in F:\Projects\lucenenet\src\Lucene.Net\Store\LockVerifyServer.cs:line 135 I suspect that has something to do with removing the wait so the timing is off, but I compared the thread handling code to some similar tests and it looks the same (including the call to Wait()), so I haven't worked out why that method call isn't completing in this case. I believe this bug is related to a couple of intermittently failing tests that also seem to indicate the LockFactory is broken. https://teamcity.jetbrains.com/viewLog.html?buildId=1101813&tab=buildResultsDiv&buildTypeId=LuceneNet_PortableBuilds_TestOnNet451 https://teamcity.jetbrains.com/viewLog.html?buildId=1084071&tab=buildResultsDiv&buildTypeId=LuceneNet_PortableBuilds_TestOnNet451 https://teamcity.jetbrains.com/viewLog.html?buildId=1071425&tab=buildResultsDiv&buildTypeId=LuceneNet_PortableBuilds_TestOnNet451 Namely, the TestLockFactory.StressTestLocks and TestLockFactory.TestStressLocksNativeFSLockFactory tests. FYI, the TestIndexWriter.TestTwoThreadsInterruptDeadlock test also fails intermittently, and is apparently concurrency related. I don't recall which tests they were, but I discovered a while back that if you put the [Repeat(20)] attribute on them, they would fail more consistently. I also noticed that they always fail if MMapDirectory is made as the only option provided by the test framework. Anyway, I would really appreciate if you could have a look to see if you can work out what is going on. Thanks, Shad Storhaug (NightOwl888)
