paulirwin opened a new issue, #1298:
URL: https://github.com/apache/lucenenet/issues/1298

   ### Is there an existing issue for this?
   
   - [x] I have searched the existing issues
   
   ### Describe the bug
   
   Split out from #1295. This is confirmed to be a platform- and 
target-agnostic race bug, not anything specific to net10.0. Although it is 
possible that it is surfacing more now on net10.0 for undeterminable framework 
reasons.
   
   Original test failure report:
   ```
   Expected: True  Actual: False
   (Test: Lucene.Net.Index.TestTransactions.TestTransactions_Mem)
   
   
   To reproduce this test result:
   
   
   Option 1:
   
   
   Apply the following assembly-level attributes:
   
   
   [assembly: 
Lucene.Net.Util.RandomSeed("0x05988a4671cb8d53:0x175a1893dc7e9151")]
   [assembly: NUnit.Framework.SetCulture("ff-Latn-SN")]
   
   
   Option 2:
   
   
   Use the following .runsettings file:
   
   
   <RunSettings>
     <TestRunParameters>
       <Parameter name="tests:seed" 
value="0x05988a4671cb8d53:0x175a1893dc7e9151" />
       <Parameter name="tests:culture" value="ff-Latn-SN" />
     </TestRunParameters>
   </RunSettings>
   Option 3:
   
   
   Create the following lucene.testsettings.json file somewhere between the 
test assembly and the root of your drive:
   
   
   {
   "tests": {
   "seed": "0x05988a4671cb8d53:0x175a1893dc7e9151",
   "culture": "ff-Latn-SN"
   }
   }
   
   
   Fixture Test Values
   
   Random Seed:           0x05988a4671cb8d53:0x175a1893dc7e9151
   Culture:               ff-Latn-SN
   Time Zone:             (UTC-05:00) Eastern Time (Port-au-Prince)
   Default Codec:         Lucene46 (RandomCodec)
   Default Similarity:    DefaultSimilarity
   
   
   System Properties
   
   Nightly:               False
   Weekly:                False
   Slow:                  True
   Awaits Fix:            False
   Directory:             random
   Verbose:               False
   Random Multiplier:     1
   ```
   
   In `TestTransactions.IndexerThread.DoWork`, it catches any exceptions that 
occur in PrepareCommit, and if so, rolls back the writers, then returns. This 
is because it doesn't care about the actual details of the exception, just that 
the transactional protocol works correctly in the presence of random I/O 
failures.
   
   However, it does not catch any exceptions thrown by Commit. When forced to 
throw exceptions in Commit, this test failure can be reproduced. By adding 
try/catch around the Commit call, like in the PrepareCommit case before it, the 
artificially-forced failure test is fixed.
   
   This appears to simply be a bug (or perhaps a limitation, to put it milder) 
in the test code, and the same limitation exists in the Java code. They likely 
might have occasionally run into this failure too. 
   
   **Is this .NET 10 related?** It does not appear to be. By forcing failure in 
Commit, the test failure can be reliably reproduced on .NET 8-10 (did not try 
.NET Framework yet). Likewise, several hours of repeated, focused test runs of 
this test did not show any failures, so it is not easily reproducible as-is. It 
is always possible that performance differences in new framework versions can 
cause races to appear more or less frequently, nondeterministically.
   
   **Why is it rare?** For this scenario to happen, the following things have 
to be true:
   
   1. The first PrepareCommit call has to succeed. Given the many calls it 
makes where it can randomly fail, this percentage is very low. A rough estimate 
from tracing the logic is that this happens about 0.01% of the time just based 
on purposefully-thrown exceptions alone.
   2. The second PrepareCommit call has to succeed. Square the probability of 
item 1.
   3. One of the two Commit calls has to throw. This is also not guaranteed, 
but more likely than not if you get to this point.
   
   In 500 repeated runs of the test on .NET 10 (macOS, arm64), with 
instrumentation added about how often each threw, the results are striking:
   
   - PrepareCommit call 1 threw 1531 times (100% of the time)
   - PrepareCommit call 2 threw 0 times (did not get there)
   - Commit threw 0 times (did not get there)
   
   **Solution:** We should catch and swallow exceptions in Commit for this test 
and roll back, since that is not the functionality under test. In fact, the 
functionality under test is precisely expecting that exceptions _do_ happen. 
Expecting them not to happen is not the goal of the test. We should do the same 
behavior as if a call to PrepareCommit fails.
   
   **Aside:** It arguably is a poorly-designed test if PrepareCommit throws 
roughly 100% of the time on the first call. If that is the case, it probably 
should just be set to throw all the time, no matter what, and not even try a 
second PrepareCommit or Commit step. But a better solution might be, we could 
configure this test to throw random exceptions _less_ often. That would let it 
more properly exercise the transactional behavior in different scenarios of 
failure AND success, and then you might have different doc counts to assert, if 
it can get through to Commit successfully from time to time. Currently, in the 
very rare scenario where it gets past all 4 calls and succeeds, we don't know 
about it if that happens. Regardless, we would still need the catch around 
Commit, since it is expected to fail if it gets to it.
   
   ### Expected Behavior
   
   _No response_
   
   ### Steps To Reproduce
   
   _No response_
   
   ### Exceptions (if any)
   
   _No response_
   
   ### Lucene.NET Version
   
   _No response_
   
   ### .NET Version
   
   _No response_
   
   ### Operating System
   
   _No response_
   
   ### Anything else?
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to