[lucenenet] branch master updated (31ac9f6b7 -> 3fbee37ed)

nightowl888 Sun, 30 Oct 2022 23:19:13 -0700

This is an automated email from the ASF dual-hosted git repository.

nightowl888 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/lucenenet.git



    from 31ac9f6b7 Sonar changes required for #671 (#730)
     new e8c34943e PERFORMANCE: 
Lucene.Net.Document.CompressionTools::CompressString(): Eliminated unnecessary 
ToCharArray() allocation
     new eea43d74b PERFORMANCE: 
Lucene.Net.Codecs.SimpleText.SimpleTextUtil::Write(): Removed unnecessary 
ToCharArray() allocation
     new e8d9d7b94 PERFORMANCE: 
Lucene.Net.Analysis.CharFilters.HTMLStripCharFilter: Removed allocation during 
parse of hexadecimal number by using J2N.Numerics.Int32 to specify index and 
length. Also added a CharArrayFormatter struct to defer the allocation of 
constructing a string until after an assertion failure.
     new 98c52a864 PERFORMANCE: Lucene.Net.Analysis.Util.CharacterUtils: Use 
spans and stackalloc to reduce heap allocations when lowercasing. Added system 
property named "maxStackLimit" that defaults to 2048 bytes.
     new 3abf2dbfe PERFORMANCE: 
Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilter: Added overloads to Add 
for ICharSequence and char[] to reduce allocations. Added guard clauses.
     new 31bbe9f35 Lucene.Net.Util.TestUnicodeUtil::TestUTF8toUTF32(): Added 
additional tests for ICharSequence and char[] overloads, changed the original 
test to test string.
     new e3606755c PERFORMANCE: 
Lucene.Net.Analysis.Util.SegmentingTokenizerBase: Removed unnecessary string 
allocations that were added during the port due to missing APIs.
     new 3e63d1529 PERFORMANCE: Lucene.Net.Analysis.Ja.GraphvizFormatter: 
Removed unnecessary surfaceForm string allocation.
     new fca99681e PERFORMANCE: Lucene.Net.Analysis.In.IndicNormalizer: 
Replaced static constructor with inline LoadScripts() method. Moved location of 
scripts field to ensure decompositions is initialized first.
     new d660b9d51 PERFORMANCE: Lucene.Net.Analysis.In.IndicNormalizer: 
Refactored ScriptData to change Dictionary<Regex, ScriptData> to 
List<ScriptData> and eliminated unnecessary hashtable lookup. Use static fields 
for unknownScript and [ThreadStatic] previousScriptData to optimize character 
script matching.
     new e72315a75 PERFORMANCE: Lucene.Net.Analysis.Th.ThaiWordBreaker: Removed 
unnecessary string allocations and concatenation. Use CharsRef to reuse the 
same memory. Removed Regex and replaced with UnicodeSet to detect Thai code 
points.
     new 56c8e08e0 PERFORAMANCE: Lucene.Net.Analysis.Ga.IrishLowerCaseFilter: 
Use stack and spans to reduce allocations and improve throughput.
     new 1d74f980a PERFORMANCE: Lucene.Net.Analysis.Util.OpenStringBuilder: 
Added overloads of UnsafeWrite() for string an ICharSequence. Optimized 
Append() methods to call UnsafeWrite with index and count to optimize the 
operation depending on the type of object passed.
     new 3fbee37ed PERFORMANCE: Lucene.Net.Analsis.Util.HTMLStripCharFilter: 
Refactored to remove YyText property (method) which allocates a string every 
time it is called. Instead, we pass the underlying array to 
J2N.Numerics.TryParse() and OpenStringBuilder.Append() with the calculated 
startIndex and length to directly copy the characters without allocating 
substrings.

The 14 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .build/dependencies.props                          |   1 +
 .../Analysis/CharFilter/HTMLStripCharFilter.cs     | 222 ++++++++++++---------
 .../Analysis/Ga/IrishLowerCaseFilter.cs            |  15 +-
 .../Analysis/In/IndicNormalizer.cs                 | 121 +++++++----
 .../Miscellaneous/StemmerOverrideFilter.cs         | 116 ++++++++++-
 .../Analysis/Nl/DutchAnalyzer.cs                   |   2 +-
 .../Analysis/Th/ThaiTokenizer.cs                   |  24 ++-
 .../Analysis/Th/ThaiWordFilter.cs                  |   4 +-
 .../Analysis/Util/CharacterUtils.cs                |  40 ++--
 .../Analysis/Util/OpenStringBuilder.cs             |  54 +++--
 .../Analysis/Util/SegmentingTokenizerBase.cs       |   6 +-
 .../Lucene.Net.Analysis.Common.csproj              |   8 +
 .../GraphvizFormatter.cs                           |   7 +-
 src/Lucene.Net.Codecs/SimpleText/SimpleTextUtil.cs |   2 +-
 .../Analysis/In/TestIndicNormalizer.cs             |  10 +-
 .../Miscellaneous/TestStemmerOverrideFilter.cs     |  27 +++
 .../Configuration/TestConfigurationService.cs      |   8 +
 .../Startup.cs                                     |   3 +-
 src/Lucene.Net.Tests/Support/TestApiConsistency.cs |   2 +-
 src/Lucene.Net.Tests/Util/TestUnicodeUtil.cs       |  82 ++++++++
 src/Lucene.Net/Document/CompressionTools.cs        |   4 +-
 src/Lucene.Net/Lucene.Net.csproj                   |   5 +-
 .../Support/Text/CharArrayFormatter.cs}            |  20 +-
 src/Lucene.Net/Util/Constants.cs                   |   7 +-
 src/Lucene.Net/Util/UnicodeUtil.cs                 | 149 +++++++++++++-
 25 files changed, 721 insertions(+), 218 deletions(-)
 copy src/{Lucene.Net.QueryParser/Flexible/Core/Nodes/MatchNoDocsQueryNode.cs 
=> Lucene.Net/Support/Text/CharArrayFormatter.cs} (61%)

[lucenenet] branch master updated (31ac9f6b7 -> 3fbee37ed)

Reply via email to