NightOwl888 commented on code in PR #1018: URL: https://github.com/apache/lucenenet/pull/1018#discussion_r1842096611
########## src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs: ########## @@ -33,10 +32,12 @@ public class TestNumericTokenStream : BaseTokenStreamTestCase internal const long lvalue = 4573245871874382L; internal const int ivalue = 123456; + // LUCENENET note - Have to explicitly use NUnit.Framework.TestAttribute in this class + // due to conflict with TestAttribute below [NUnit.Framework.Test] Review Comment: This is a potential time bomb. If there is a test framework feature that requires us to use a custom `TestAttribute` to override what NUnit does, then this will bypass our custom attribute. To override NUnit's attribute, we would nest `TestAttribute` within `LuceneTestCase` so it will take precedence over NUnit's. We already do this with the `TestFixtureAttribute` and NUnit doesn't provide any other avenue for extension than to create custom attributes AFAIK. So, I think we should change the name of our `TestAttribute` below to something else (perhaps `TestFakeAttribute` and `ITestFakeAttribute`) to distinguish it from NUnit's attribute. Then we don't have to use an explicit namespace to use NUnit's `TestAttribute` and if we customize NUnit's attribute this won't silently fail to use our test framework over NUnit. ########## src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs: ########## @@ -79,7 +82,8 @@ public virtual void TestIntStream() } Assert.IsFalse(stream.IncrementToken(), "More tokens available"); stream.End(); - stream.Dispose(); + // LUCENENET specific - stream disposed above via using statement + // stream.Dispose(); } [NUnit.Framework.Test] Review Comment: See my note above about why we shouldn't be using explicit namespaces for NUnit attributes. ########## src/Lucene.Net.Tests/Analysis/TokenAttributes/TestCharTermAttributeImpl.cs: ########## @@ -230,9 +230,7 @@ public virtual void TestAppendableInterface() t.Append((ICharSequence)t2, 1, 5 - 1); // LUCENENET: Corrected 3rd parameter Review Comment: Above, we should probably restore all of the `CharBuffer` tests, since we have [CharBuffer](https://github.com/NightOwl888/J2N/blob/3239ce6e3264e330efc2234f2490000c40cf67fd/src/J2N/IO/CharBuffer.cs) in J2N. I am pretty sure we can eliminate most of the `ICharSequence` members in the project, but not sure yet whether it makes sense to eliminate `IAppendable` as well. Most likely, `ICharTermAttribute` will still implement both of them. We should also have tests for `ReadOnlySpan<char>` (which should be pretty much identical to either `char[]` or `ICharSequence` tests). There is an extension method on `IAppendable` that accepts `ReadOnlySpan<char>`, but provides a degraded experience from a type that implements `ISpanAppendable` and does the optimizations. So, we can do the tests now. Whether you want to jump ahead and provide an optimized implementation of `Append(ReadOnlySpan<char>)` I will leave up to you. > NOTE: There is no overload of `Append(ReadOnlySpan<char>, int, int)` since it is unnecessary. We can simply slice the `ReadOnlySpan<char>` before passing it into the method. Ditto for the `TestAppendableInterfaceWithLongSequences()` test. ########## src/Lucene.Net.Tests/Codecs/Lucene40/TestLucene40PostingsReader.cs: ########## @@ -44,16 +44,14 @@ namespace Lucene.Net.Codecs.Lucene40 [TestFixture] public class TestLucene40PostingsReader : LuceneTestCase { - internal static readonly string[] terms = LoadTerms(); + internal static readonly string[] terms = new string[100]; - static string[] LoadTerms() + static TestLucene40PostingsReader() Review Comment: We [should be avoiding static constructors](https://github.com/apache/lucenenet/pull/224#issuecomment-469284006), so let's stick with the original implementation, here. In other places, this was commented when setting the static variable like so: ```c# internal static readonly string[] terms = LoadTerms(); // LUCENENET: Avoid static constructors (see https://github.com/apache/lucenenet/pull/224#issuecomment-469284006) ``` ########## src/Lucene.Net.Tests/Document/TestBinaryDocument.cs: ########## @@ -95,24 +89,35 @@ public virtual void TestCompressionTools() IIndexableField binaryFldCompressed = new StoredField("binaryCompressed", CompressionTools.Compress(binaryValCompressed.GetBytes(Encoding.UTF8))); IIndexableField stringFldCompressed = new StoredField("stringCompressed", CompressionTools.CompressString(binaryValCompressed)); - var doc = new Documents.Document {binaryFldCompressed, stringFldCompressed}; + var doc = new Document(); Review Comment: Actually, we should use the collection initializer here. Users who run across this should be aware there is an alternative to using the `.Add()` method. Not to mention, we should be testing the collection initializer. But do spread it out over multiple lines so it looks similar to the Java code. ```c# var doc = new Document { binaryFldCompressed, stringFldCompressed }; ``` ########## src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs: ########## @@ -55,12 +56,14 @@ public virtual void TestLongStream() } Assert.IsFalse(stream.IncrementToken(), "More tokens available"); stream.End(); + // LUCENENET specific - stream disposed above via using statement + // stream.Dispose(); } [NUnit.Framework.Test] Review Comment: See my note above about why we shouldn't be using explicit namespaces for NUnit attributes. ########## src/Lucene.Net.Tests/Codecs/Lucene41/TestForUtil.cs: ########## @@ -3,6 +3,10 @@ using RandomizedTesting.Generators; using Assert = Lucene.Net.TestFramework.Assert; using RandomInts = RandomizedTesting.Generators.RandomNumbers; +#if !FEATURE_RANDOM_NEXTINT64_NEXTSINGLE +using RandomizedTesting.Generators; Review Comment: We have this declared twice after adding it here, and we are now getting warnings. Please fix this. ``` The using directive for 'RandomizedTesting.Generators' appeared previously in this namespace ``` ########## src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs: ########## @@ -112,6 +116,7 @@ public interface ITestAttribute : ICharTermAttribute { } + // ReSharper disable once UnusedType.Global - presumably used to test ability to add derived attributes public class TestAttribute : CharTermAttribute, ITestAttribute { } Review Comment: See my note above about why we shouldn't be using explicit namespaces for NUnit attributes (2 lines below this). ########## src/Lucene.Net.Tests/Document/TestBinaryDocument.cs: ########## @@ -95,24 +89,35 @@ public virtual void TestCompressionTools() IIndexableField binaryFldCompressed = new StoredField("binaryCompressed", CompressionTools.Compress(binaryValCompressed.GetBytes(Encoding.UTF8))); IIndexableField stringFldCompressed = new StoredField("stringCompressed", CompressionTools.CompressString(binaryValCompressed)); - var doc = new Documents.Document {binaryFldCompressed, stringFldCompressed}; + var doc = new Document(); + doc.Add(binaryFldCompressed); + doc.Add(stringFldCompressed); + + // add the doc to a ram index using Directory dir = NewDirectory(); using RandomIndexWriter writer = new RandomIndexWriter(Random, dir); writer.AddDocument(doc); + // open a reader and fetch the document using IndexReader reader = writer.GetReader(); Document docFromReader = reader.Document(0); Assert.IsTrue(docFromReader != null); + // fetch the binary compressed field and compare it's content with the original one + // LUCENENET: was `= new String(CompressionTools.decompress(docFromReader.getBinaryValue("binaryCompressed")), StandardCharsets.UTF_8);` string binaryFldCompressedTest = Encoding.UTF8.GetString( Review Comment: This was incorrectly translated. While we could be using `IOUtils.CHARSET_UTF_8` rather than `Encoding.UTF8` here because the former doesn't use a BOM, I don't think it matters when calling `GetString()`. However, we probably should be explicit in case something changes in the future. Just to stay consistent with Java, we should add an internal `StandardCharsets` static class with a `UTF_8` field in Support, which would help prevent porting errors like this in the future. It should be set to the same value as `IOUtils.CHARSET_UTF_8` (ideally, sharing the same instance). Also we should do a review of all encoding calls to make sure we are correctly matching Lucene when using or not using a BOM (basically, changing all of these lines to use `StandardCharsets` rather than `Encoding`). I know this came up in prior test reviews. I searched for "bom", "byte" and "encoding", but don't see where an issue has been opened about this previously, so if it is listed as something else, please let me know. Otherwise, please open an issue for this. It may not look like C#, but it is less likely to go wrong. This is a similar approach that we took with the error handling because there were so many mistranslated exceptions during the port and they have a different inheritance hierarchy that we couldn't be sure whether they were making it to the correct exception handler. ########## src/Lucene.Net.Tests/Analysis/TokenAttributes/TestCharTermAttributeImpl.cs: ########## @@ -241,26 +239,20 @@ public virtual void TestAppendableInterface() t.Append((ICharSequence)t2, 1, 0 - 1); // LUCENENET: Corrected 3rd parameter Assert.Fail("Should throw ArgumentOutOfRangeException"); } -#pragma warning disable 168 - catch (ArgumentOutOfRangeException iobe) -#pragma warning restore 168 + catch (ArgumentOutOfRangeException /*iobe*/) { } - string expected = t.ToString(); - t.Append((ICharSequence)null); // No-op - Assert.AreEqual(expected, t.ToString()); - + t.Append((ICharSequence)null); // LUCENENET specific: No-op for appending null + Assert.AreEqual("4teste", t.ToString()); // LUCENENET specific: different assertion Review Comment: Let's make this message more informative as to why we made this decision. ```c# // LUCENENET specific: Excluding "null" from the result string because in .NET `StringBuilder.Append((string)null)` is a no-op rather than appending the string "null", so we do the same to match. ``` ########## src/Lucene.Net.Tests/Codecs/Lucene40/TestBitVector.cs: ########## @@ -44,14 +44,15 @@ public virtual void TestConstructSize() DoTestConstructOfSize(1000); } - private void DoTestConstructOfSize(int n) + // LUCENENET specific - made static + private static void DoTestConstructOfSize(int n) { BitVector bv = new BitVector(n); Assert.AreEqual(n, bv.Length); // LUCENENET NOTE: Length is the equivalent of size() } /// <summary> - /// Test the get() and set() methods on BitVectors of various sizes. + /// Test the <see cref="BitVector.Get(int)"/> and <see cref="BitVector.Set(int)"/> methods on BitVectors of various sizes. Review Comment: I was looking at `BitVector` to see whether it is sensible to make these into `this[int] { get; set; }` instead of (or in addition to) `Get(int)` and `Set(int)`. Being that there is a `GetAndSet()` method (much like the atomic classes), I am thinking not. But I discovered that inside of several of the methods of `BitVector`, we are throwing `IndexOutOfRangeException`. `IndexOutOfRangeException` is a special exception type in .NET that should only be thrown from the setter of an indexer. In virtually all other cases, these should be converted to `ArgumentOutOfRangeException`. Looks like there are a handful of other types that are throwing it, as well. We should open a new issue to convert these to `ArgumentOutOfRangeException` and double-check that there will be no adverse effects from making the change. These were originally intentionally normalized to `IndexOutOfRangeException` to make the error handling simpler, but since then we have fixed the error handling to catch either `IndexOutOfRangeException` or `ArgumentOutOfRangeException` and treat them the same using the [`IsIndexOutOfBoundsException()` extension method](https://github.com/apache/lucenenet/blob/92029e8987076ff7264c4d5c8ce28ea77af9bd26/src/Lucene.Net/Support/Exception Handling/ExceptionExtensions.cs#L260-L266) (although, it may not be used everywhere it is needed to make the change). I also noticed when searching for `IndexOutOfRangeException` that `UTF8toUTF16` can throw it, also. This is unusual to have to deal with, so we should change this method to `TryUTF8toUTF16` so we can eliminate this exception that is clearly exclusively meant for control flow when the UTF8 format is invalid. It is caught in several places to do a fallback, and we should fix this. So, we will need a new issue for that, as well. Note that we will most likely be converting the `byte[]` overload to use `ReadOnlySpan<byte>` and getting rid of the `offset` and `length` parameters. ########## src/Lucene.Net.Tests/Analysis/TestReusableStringReader.cs: ########## @@ -30,42 +30,38 @@ public virtual void Test() { char[] buf = new char[4]; - using (ReusableStringReader reader = new ReusableStringReader()) - { - Assert.AreEqual(-1, reader.Read()); - Assert.AreEqual(-1, reader.Read(new char[1], 0, 1)); - Assert.AreEqual(-1, reader.Read(new char[2], 1, 1)); - //Assert.AreEqual(-1, reader.Read(CharBuffer.wrap(new char[2]))); + ReusableStringReader reader = new ReusableStringReader(); Review Comment: `ReusableStringReader` is internal in Java, but we made it public. https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/analysis/ReusableStringReader.java#L23 I tried changing it to internal and it still compiles, so let's mark this internal (breaking change). This is the second accessibility issue we found recently in the Lucene.Net assembly. I am pretty sure I went over every member to manually ensure we matched accessibility, but this was such a daunting task that it put me to sleep a few times. Not to mention, making some things internal wasn't possible at first because there were other things blocking it that may no longer be blocking. I asked ChatGPT to do it, but it suggested comparing the metadata for the API docs in DocFx and javadoc. https://chatgpt.com/share/6735ea5d-5ecc-8005-98df-cf3c65e4795c. It seems like a bit of work, but it would be considerably less work than doing the task manually again. So, I think we should have a new issue for this. It is probably worth it if we can minimize the number of APIs that we have declared public in the release. Plus if we have a tool that we can use to compare accessibility across a whole assembly, it would make keeping this in sync simpler for the long term. ########## src/Lucene.Net.Tests/Codecs/Compressing/TestCompressingStoredFieldsFormat.cs: ########## @@ -35,6 +35,7 @@ namespace Lucene.Net.Codecs.Compressing using MockAnalyzer = Lucene.Net.Analysis.MockAnalyzer; using RandomIndexWriter = Lucene.Net.Index.RandomIndexWriter; + // LUCENENET: Moved @Repeat(iterations=5) to the method level Review Comment: Good catch. However, this is not equivalent to Lucene, being that there are tests in `BaseStoredFieldsFormatTestCase` and `BaseIndexFileFormatTestCase` that should also be repeated 5 times for this codec. Please move the `[Repeat(5)]` attribute back to the class level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org