[GitHub] [lucenenet] NightOwl888 opened a new pull request #325: Fix for LineFileDocs Bottleneck/Performance Improvements

GitBox Mon, 10 Aug 2020 17:17:21 -0700


NightOwl888 opened a new pull request #325:
URL: https://github.com/apache/lucenenet/pull/325

This fixes a bottleneck (see #261) caused by unzipping the line docs file in
RAM (~15MB) and then selecting a random line in the file. The .NET `GZipStream`
is not seekable, so this was done by copying the entire contents into a
`MemoryStream` first. This happened during a significant number of the tests
(~20%), and happened in each one of those tests.

The fix was to set up the test framework to unzip the file to a temp file on
the test machine. This happens in 1 of 3 different ways:

1. If `LineFileDocs` is used directly in a class that does not specify
`LuceneTestCase.UseTempLineDocsFile = true`, `LineFileDocs` will unzip the file
before it is used (per instance of the class) and deleted when it is disposed.
2. If `LuceneTestCase.UseTempLineDocsFile = true` is specified in the test
fixture, the file will be unzipped in the `BeforeClass()` method and deleted in
the `AfterClass()` method.
3. If the test project makes heavy use of this file, adding a subclass of
`LuceneTestFrameworkInitializer` to the test project (outside of all
namespaces) will cause the file to be unzipped only once for all of the tests
in that project and deleted after the last test is finished.

There are also several other patches in this PR:

- The seek behavior of `LineFileDocs` was reverted back to Lucene's original
implementation, which has revealed some (potential) false positives in some of
the ICU tests. A `BufferedStream` was added to improve performance.
- Removed unnecessary variable allocations.
- Fixed a bug with the `Nightly`, `Weekly`, `Slow`, and `AwaitsFix`
attributes so they will wait until NUnit runs the initialization code before
running.
- Added a `DeadlockAttribute` to time out tests that we are now seeing
threading contention issues with after improving raw speed. This is to ensure
that they will fail in the CI environment if they actually deadlock and also
can be used to filter out these tests during runs.
- Simplified some expressions to make them simpler to maintain.
- Commented out dead code and unnecessary variable declarations that were
carried over from Java.
- Fixed a bug in the `ICUTokenizer` where it was calling
`System.Char.IsWhiteSpace()` when it should have been calling
`ICU4N.UChar.IsWhiteSpace()` to ensure it is reading the correct version of ICU.
- Changed implementation of `DisposableThreadLocal` to that of RavenDB,
[with permission from its
maintainers](https://issues.apache.org/jira/browse/LUCENENET-640?focusedCommentId=17033146&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17033146).
(closes #251)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [lucenenet] NightOwl888 opened a new pull request #325: Fix for LineFileDocs Bottleneck/Performance Improvements

Reply via email to