NightOwl888 commented on code in PR #1028: URL: https://github.com/apache/lucenenet/pull/1028#discussion_r1848113664
########## src/Lucene.Net.Facet/Taxonomy/WriterCache/CharBlockArray.cs: ########## @@ -310,29 +248,8 @@ public virtual CharBlockArray Append(string? value, int startIndex, int length) if (startIndex > value.Length - length) throw new ArgumentOutOfRangeException(nameof(startIndex), $"Index and length must refer to a location within the string. For example {nameof(startIndex)} + {nameof(length)} <= {nameof(Length)}."); - - int offset = startIndex; - int remain = length; - while (remain > 0) - { - if (this.current.length == this.blockSize) - { - AddBlock(); - } - int toCopy = remain; - int remainingInBlock = this.blockSize - this.current.length; - if (remainingInBlock < toCopy) - { - toCopy = remainingInBlock; - } - value.CopyTo(offset, this.current.chars, this.current.length, toCopy); Review Comment: See my note above about copying performance. ########## src/Lucene.Net/Analysis/TokenAttributes/CharTermAttributeImpl.cs: ########## @@ -216,10 +216,8 @@ public CharTermAttribute Append(string value, int startIndex, int charCount) if (startIndex > value.Length - charCount) throw new ArgumentOutOfRangeException(nameof(startIndex), $"Index and length must refer to a location within the string. For example {nameof(startIndex)} + {nameof(charCount)} <= {nameof(Length)}."); - value.CopyTo(startIndex, InternalResizeBuffer(termLength + charCount), termLength, charCount); Review Comment: See my note above about copying performance. ########## src/Lucene.Net.Facet/Taxonomy/WriterCache/CharBlockArray.cs: ########## @@ -235,29 +214,8 @@ public virtual CharBlockArray Append(char[]? value, int startIndex, int length) if (startIndex > value.Length - length) throw new ArgumentOutOfRangeException(nameof(startIndex), $"Index and length must refer to a location within the string. For example {nameof(startIndex)} + {nameof(length)} <= {nameof(Length)}."); - - int offset = startIndex; - int remain = length; - while (remain > 0) - { - if (this.current.length == this.blockSize) - { - AddBlock(); - } - int toCopy = remain; - int remainingInBlock = this.blockSize - this.current.length; - if (remainingInBlock < toCopy) - { - toCopy = remainingInBlock; - } - Arrays.Copy(value, offset, this.current.chars, this.current.length, toCopy); Review Comment: See my note above about copying performance. ########## src/Lucene.Net/Analysis/TokenAttributes/CharTermAttributeImpl.cs: ########## @@ -234,11 +232,8 @@ public CharTermAttribute Append(char[] value) //return AppendNull(); return this; // No-op - int len = value.Length; - value.CopyTo(InternalResizeBuffer(termLength + len), termLength); Review Comment: See my note above about copying performance. ########## src/Lucene.Net.Facet/Taxonomy/WriterCache/CharBlockArray.cs: ########## @@ -192,28 +191,8 @@ public virtual CharBlockArray Append(char[]? value) return this; // No-op } - int remain = value.Length; - int offset = 0; - while (remain > 0) - { - if (this.current.length == this.blockSize) - { - AddBlock(); - } - int toCopy = remain; - int remainingInBlock = this.blockSize - this.current.length; - if (remainingInBlock < toCopy) - { - toCopy = remainingInBlock; - } - Arrays.Copy(value, offset, this.current.chars, this.current.length, toCopy); Review Comment: In #766, it was discovered that on .NET Framework using `ReadOnlySpan<T>.Copy()` is significantly slower than other copy methods in most cases. There was an `IArrayCopier<T>` added to select the best copy method depending on data type and other factors: https://github.com/apache/lucenenet/pull/766/files#diff-c0b8165f7f169cb39fade950ceb5b519f42e4869be060bb3114716e453bf8d37. This is exposed as `Arrays.Copy<T>()`. It would be best to add the `Append(ReadOnlySpan<char>)` overload without modifying these other overloads, unless you want to go through benchmarking the char data type all over again or we are switching from a less performant copy to a more performant copy, such as `Arrays.Copy<T>()`. ########## src/Lucene.Net/Analysis/TokenAttributes/CharTermAttributeImpl.cs: ########## @@ -260,10 +255,8 @@ public CharTermAttribute Append(char[] value, int startIndex, int charCount) if (startIndex > value.Length - charCount) throw new ArgumentOutOfRangeException(nameof(startIndex), $"Index and length must refer to a location within the string. For example {nameof(startIndex)} + {nameof(charCount)} <= {nameof(Length)}."); - Arrays.Copy(value, startIndex, InternalResizeBuffer(termLength + charCount), termLength, charCount); Review Comment: See my note above about copying performance. ########## src/Lucene.Net.Tests/Analysis/TokenAttributes/TestCharTermAttributeImpl.cs: ########## @@ -326,6 +330,10 @@ public virtual void TestAppendableInterfaceWithLongSequences() const string longTestString = "012345678901234567890123456789"; t.Append(new CharSequenceAnonymousClass(longTestString)); Assert.AreEqual("4567890123456" + longTestString, t.ToString()); + + // LUCENENET specific - test ReadOnlySpan<char> overload + t.SetEmpty().Append("01234567890123456789012345678901234567890123456789".AsSpan()); + Assert.AreEqual("01234567890123456789012345678901234567890123456789", t.ToString()); Review Comment: Since `ReadOnlySpan<char>` is often a direct replacement for `CharBuffer` we should demonstrate how to slice `ReadOnlySpan<char>` the same way `CharBuffer` is sliced. ```c# t.Append(/*(ICharSequence)*/ CharBuffer.Wrap("01234567890123456789012345678901234567890123456789".ToCharArray()), 3, 50 - 3); // LUCENENET: Corrected 3rd parameter Assert.AreEqual("0123456789012345678901234567890123456789012345678934567890123456789012345678901234567890123456789", t.ToString()); ``` Should be translated as: ```c# t.Append("01234567890123456789012345678901234567890123456789".AsSpan(3, 50 - 3)); // LUCENENET: Corrected 2nd parameter Assert.AreEqual("0123456789012345678901234567890123456789012345678934567890123456789012345678901234567890123456789", t.ToString()); ``` It looks like we may need to adjust the expected value depending on where this is called, though. Maybe it would be best to break it out as a `TestSpanAppendableInterface()` test. Note that this test is more of a demo for users to show how to translate the Java code to .NET than a real test. ########## src/Lucene.Net/Analysis/TokenAttributes/CharTermAttribute.cs: ########## @@ -76,15 +77,15 @@ public interface ICharTermAttribute : IAttribute, ICharSequence, IAppendable /// the termBuffer array. Use this to truncate the termBuffer /// or to synchronize with external manipulation of the termBuffer. /// Note: to grow the size of the array, - /// use <see cref="ResizeBuffer(int)"/> first. - /// NOTE: This is exactly the same operation as calling the <see cref="Length"/> setter, the primary + /// use <see cref="ResizeBuffer(int)"/> first. + /// NOTE: This is exactly the same operation as calling the <see cref="Length"/> setter, the primary /// difference is that this method returns a reference to the current object so it can be chained. /// <code> /// obj.SetLength(30).Append("hey you"); /// </code> /// </summary> /// <param name="length"> the truncated length </param> - ICharTermAttribute SetLength(int length); + ICharTermAttribute SetLength(int length); Review Comment: There doesn't seem to be a good reason to force implementations to define `SetLength()`, since it should just cascade the call to the setter of `Length`. We should remove this from the interface and make this method into an extension method on `ICharTermAttribute` (in a `Lucene.Net.Analysis.TokenAttributes.Extensions` namespace). It should also be made to return the type of `ICharTermAttribute` it is called upon rather than the interface type. This can be made into a separate issue. ########## src/Lucene.Net/Analysis/TokenAttributes/CharTermAttribute.cs: ########## @@ -26,7 +26,8 @@ namespace Lucene.Net.Analysis.TokenAttributes /// <summary> /// The term text of a <see cref="Token"/>. /// </summary> - public interface ICharTermAttribute : IAttribute, ICharSequence, IAppendable + public interface ICharTermAttribute : IAttribute, ICharSequence, IAppendable, + ISpanAppendable /* LUCENENET specific */ Review Comment: This appears to be the breaking change you were referring to. I am debating whether it makes sense to add this interface here, since it was specifically separated out so that it wouldn't need to be a breaking change. But if we are going to break this, now is the time. Keep in mind, there is a degraded `Append(ReadOnlySpan<char>)` extension method that is functional if this isn't implemented. The implementation is just an optimization. So, while there is a clear performance benefit to forcing the user to implement this, it isn't strictly necessary. If the user does implement `ISpanAppendable` on the concrete type, it will use that implementation instead of the degraded one. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org