Re: [PR] Test review A-D, #259 [lucenenet]

via GitHub Thu, 14 Nov 2024 08:26:08 -0800


NightOwl888 commented on code in PR #1018:
URL: https://github.com/apache/lucenenet/pull/1018#discussion_r1842096611



##########
src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs:
##########
@@ -33,10 +32,12 @@ public class TestNumericTokenStream : 
BaseTokenStreamTestCase
         internal const long lvalue = 4573245871874382L;
         internal const int ivalue = 123456;
 
+        // LUCENENET note - Have to explicitly use 
NUnit.Framework.TestAttribute in this class
+        // due to conflict with TestAttribute below
         [NUnit.Framework.Test]

Review Comment:
   This is a potential time bomb. If there is a test framework feature that 
requires us to use a custom `TestAttribute` to override what NUnit does, then 
this will bypass our custom attribute. To override NUnit's attribute, we would 
nest `TestAttribute` within `LuceneTestCase` so it will take precedence over 
NUnit's. We already do this with the `TestFixtureAttribute` and NUnit doesn't 
provide any other avenue for extension than to create custom attributes AFAIK.
   
   So, I think we should change the name of our `TestAttribute` below to 
something else (perhaps `TestFakeAttribute` and `ITestFakeAttribute`) to 
distinguish it from NUnit's attribute. Then we don't have to use an explicit 
namespace to use NUnit's `TestAttribute` and if we customize NUnit's attribute 
this won't silently fail to use our test framework over NUnit.



##########
src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs:
##########
@@ -79,7 +82,8 @@ public virtual void TestIntStream()
             }
             Assert.IsFalse(stream.IncrementToken(), "More tokens available");
             stream.End();
-            stream.Dispose();
+            // LUCENENET specific - stream disposed above via using statement
+            // stream.Dispose();
         }
 
         [NUnit.Framework.Test]

Review Comment:
   See my note above about why we shouldn't be using explicit namespaces for 
NUnit attributes.



##########
src/Lucene.Net.Tests/Analysis/TokenAttributes/TestCharTermAttributeImpl.cs:
##########
@@ -230,9 +230,7 @@ public virtual void TestAppendableInterface()
                 t.Append((ICharSequence)t2, 1, 5 - 1); // LUCENENET: Corrected 
3rd parameter

Review Comment:
   Above, we should probably restore all of the `CharBuffer` tests, since we 
have 
[CharBuffer](https://github.com/NightOwl888/J2N/blob/3239ce6e3264e330efc2234f2490000c40cf67fd/src/J2N/IO/CharBuffer.cs)
 in J2N. I am pretty sure we can eliminate most of the `ICharSequence` members 
in the project, but not sure yet whether it makes sense to eliminate 
`IAppendable` as well. Most likely, `ICharTermAttribute` will still implement 
both of them.
   
   We should also have tests for `ReadOnlySpan<char>` (which should be pretty 
much identical to either `char[]` or `ICharSequence` tests). There is an 
extension method on `IAppendable` that accepts `ReadOnlySpan<char>`, but 
provides a degraded experience from a type that implements `ISpanAppendable` 
and does the optimizations. So, we can do the tests now. Whether you want to 
jump ahead and provide an optimized implementation of 
`Append(ReadOnlySpan<char>)` I will leave up to you.
   
   > NOTE: There is no overload of `Append(ReadOnlySpan<char>, int, int)` since 
it is unnecessary. We can simply slice the `ReadOnlySpan<char>` before passing 
it into the method.
   
   Ditto for the `TestAppendableInterfaceWithLongSequences()` test.



##########
src/Lucene.Net.Tests/Codecs/Lucene40/TestLucene40PostingsReader.cs:
##########
@@ -44,16 +44,14 @@ namespace Lucene.Net.Codecs.Lucene40
     [TestFixture]
     public class TestLucene40PostingsReader : LuceneTestCase
     {
-        internal static readonly string[] terms = LoadTerms();
+        internal static readonly string[] terms = new string[100];
 
-        static string[] LoadTerms()
+        static TestLucene40PostingsReader()

Review Comment:
   We [should be avoiding static 
constructors](https://github.com/apache/lucenenet/pull/224#issuecomment-469284006),
 so let's stick with the original implementation, here. In other places, this 
was commented when setting the static variable like so:
   
   ```c#
   internal static readonly string[] terms = LoadTerms(); // LUCENENET: Avoid 
static constructors (see 
https://github.com/apache/lucenenet/pull/224#issuecomment-469284006)
   ```



##########
src/Lucene.Net.Tests/Document/TestBinaryDocument.cs:
##########
@@ -95,24 +89,35 @@ public virtual void TestCompressionTools()
             IIndexableField binaryFldCompressed = new 
StoredField("binaryCompressed", 
CompressionTools.Compress(binaryValCompressed.GetBytes(Encoding.UTF8)));
             IIndexableField stringFldCompressed = new 
StoredField("stringCompressed", 
CompressionTools.CompressString(binaryValCompressed));
 
-            var doc = new Documents.Document {binaryFldCompressed, 
stringFldCompressed};
+            var doc = new Document();

Review Comment:
   Actually, we should use the collection initializer here. Users who run 
across this should be aware there is an alternative to using the `.Add()` 
method. Not to mention, we should be testing the collection initializer. But do 
spread it out over multiple lines so it looks similar to the Java code.
   
   ```c#
   var doc = new Document
   {
       binaryFldCompressed, 
       stringFldCompressed
   };
   ```



##########
src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs:
##########
@@ -55,12 +56,14 @@ public virtual void TestLongStream()
             }
             Assert.IsFalse(stream.IncrementToken(), "More tokens available");
             stream.End();
+            // LUCENENET specific - stream disposed above via using statement
+            // stream.Dispose();
         }
 
         [NUnit.Framework.Test]

Review Comment:
   See my note above about why we shouldn't be using explicit namespaces for 
NUnit attributes.



##########
src/Lucene.Net.Tests/Codecs/Lucene41/TestForUtil.cs:
##########
@@ -3,6 +3,10 @@
 using RandomizedTesting.Generators;
 using Assert = Lucene.Net.TestFramework.Assert;
 using RandomInts = RandomizedTesting.Generators.RandomNumbers;
+#if !FEATURE_RANDOM_NEXTINT64_NEXTSINGLE
+using RandomizedTesting.Generators;

Review Comment:
   We have this declared twice after adding it here, and we are now getting 
warnings. Please fix this.
   
   ```
   The using directive for 'RandomizedTesting.Generators' appeared previously 
in this namespace
   ```



##########
src/Lucene.Net.Tests/Analysis/TestNumericTokenStream.cs:
##########
@@ -112,6 +116,7 @@ public interface ITestAttribute : ICharTermAttribute
         {
         }
 
+        // ReSharper disable once UnusedType.Global - presumably used to test 
ability to add derived attributes
         public class TestAttribute : CharTermAttribute, ITestAttribute
         {
         }

Review Comment:
   See my note above about why we shouldn't be using explicit namespaces for 
NUnit attributes (2 lines below this).



##########
src/Lucene.Net.Tests/Document/TestBinaryDocument.cs:
##########
@@ -95,24 +89,35 @@ public virtual void TestCompressionTools()
             IIndexableField binaryFldCompressed = new 
StoredField("binaryCompressed", 
CompressionTools.Compress(binaryValCompressed.GetBytes(Encoding.UTF8)));
             IIndexableField stringFldCompressed = new 
StoredField("stringCompressed", 
CompressionTools.CompressString(binaryValCompressed));
 
-            var doc = new Documents.Document {binaryFldCompressed, 
stringFldCompressed};
+            var doc = new Document();
 
+            doc.Add(binaryFldCompressed);
+            doc.Add(stringFldCompressed);
+
+            // add the doc to a ram index
             using Directory dir = NewDirectory();
             using RandomIndexWriter writer = new RandomIndexWriter(Random, 
dir);
             writer.AddDocument(doc);
 
+            // open a reader and fetch the document
             using IndexReader reader = writer.GetReader();
             Document docFromReader = reader.Document(0);
             Assert.IsTrue(docFromReader != null);
 
+            // fetch the binary compressed field and compare it's content with 
the original one
+            // LUCENENET: was `= new 
String(CompressionTools.decompress(docFromReader.getBinaryValue("binaryCompressed")),
 StandardCharsets.UTF_8);`
             string binaryFldCompressedTest =
                 Encoding.UTF8.GetString(

Review Comment:
   This was incorrectly translated.
   
   While we could be using `IOUtils.CHARSET_UTF_8` rather than `Encoding.UTF8` 
here because the former doesn't use a BOM, I don't think it matters when 
calling `GetString()`. However, we probably should be explicit in case 
something changes in the future.
   
   Just to stay consistent with Java, we should add an internal 
`StandardCharsets` static class with a `UTF_8` field in Support, which would 
help prevent porting errors like this in the future. It should be set to the 
same value as `IOUtils.CHARSET_UTF_8` (ideally, sharing the same instance).
   
   Also we should do a review of all encoding calls to make sure we are 
correctly matching Lucene when using or not using a BOM (basically, changing 
all of these lines to use `StandardCharsets` rather than `Encoding`). I know 
this came up in prior test reviews. I searched for "bom", "byte" and 
"encoding", but don't see where an issue has been opened about this previously, 
so if it is listed as something else, please let me know. Otherwise, please 
open an issue for this.
   
   It may not look like C#, but it is less likely to go wrong. This is a 
similar approach that we took with the error handling because there were so 
many mistranslated exceptions during the port and they have a different 
inheritance hierarchy that we couldn't be sure whether they were making it to 
the correct exception handler.



##########
src/Lucene.Net.Tests/Analysis/TokenAttributes/TestCharTermAttributeImpl.cs:
##########
@@ -241,26 +239,20 @@ public virtual void TestAppendableInterface()
                 t.Append((ICharSequence)t2, 1, 0 - 1); // LUCENENET: Corrected 
3rd parameter
                 Assert.Fail("Should throw ArgumentOutOfRangeException");
             }
-#pragma warning disable 168
-            catch (ArgumentOutOfRangeException iobe)
-#pragma warning restore 168
+            catch (ArgumentOutOfRangeException /*iobe*/)
             {
             }
 
-            string expected = t.ToString();
-            t.Append((ICharSequence)null); // No-op
-            Assert.AreEqual(expected, t.ToString());
-
+            t.Append((ICharSequence)null); // LUCENENET specific: No-op for 
appending null
+            Assert.AreEqual("4teste", t.ToString()); // LUCENENET specific: 
different assertion

Review Comment:
   Let's make this message more informative as to why we made this decision.
   
   ```c#
   // LUCENENET specific: Excluding "null" from the result string because in 
.NET `StringBuilder.Append((string)null)` is a no-op rather than appending the 
string "null", so we do the same to match.
   ```



##########
src/Lucene.Net.Tests/Codecs/Lucene40/TestBitVector.cs:
##########
@@ -44,14 +44,15 @@ public virtual void TestConstructSize()
             DoTestConstructOfSize(1000);
         }
 
-        private void DoTestConstructOfSize(int n)
+        // LUCENENET specific - made static
+        private static void DoTestConstructOfSize(int n)
         {
             BitVector bv = new BitVector(n);
             Assert.AreEqual(n, bv.Length); // LUCENENET NOTE: Length is the 
equivalent of size()
         }
 
         /// <summary>
-        /// Test the get() and set() methods on BitVectors of various sizes.
+        /// Test the <see cref="BitVector.Get(int)"/> and <see 
cref="BitVector.Set(int)"/> methods on BitVectors of various sizes.

Review Comment:
   I was looking at `BitVector` to see whether it is sensible to make these 
into `this[int] { get; set; }` instead of (or in addition to) `Get(int)` and 
`Set(int)`. Being that there is a `GetAndSet()` method (much like the atomic 
classes), I am thinking not.
   
   But I discovered that inside of several of the methods of `BitVector`, we 
are throwing `IndexOutOfRangeException`. `IndexOutOfRangeException` is a 
special exception type in .NET that should only be thrown from the setter of an 
indexer. In virtually all other cases, these should be converted to 
`ArgumentOutOfRangeException`. Looks like there are a handful of other types 
that are throwing it, as well. We should open a new issue to convert these to 
`ArgumentOutOfRangeException` and double-check that there will be no adverse 
effects from making the change. These were originally intentionally normalized 
to `IndexOutOfRangeException` to make the error handling simpler, but since 
then we have fixed the error handling to catch either 
`IndexOutOfRangeException` or `ArgumentOutOfRangeException` and treat them the 
same using the [`IsIndexOutOfBoundsException()` extension 
method](https://github.com/apache/lucenenet/blob/92029e8987076ff7264c4d5c8ce28ea77af9bd26/src/Lucene.Net/Support/Exception
 Handling/ExceptionExtensions.cs#L260-L266) (although, it may not be used 
everywhere it is needed to make the change).
   
   I also noticed when searching for `IndexOutOfRangeException` that 
`UTF8toUTF16` can throw it, also. This is unusual to have to deal with, so we 
should change this method to `TryUTF8toUTF16` so we can eliminate this 
exception that is clearly exclusively meant for control flow when the UTF8 
format is invalid. It is caught in several places to do a fallback, and we 
should fix this. So, we will need a new issue for that, as well. Note that we 
will most likely be converting the `byte[]` overload to use 
`ReadOnlySpan<byte>` and getting rid of the `offset` and `length` parameters.



##########
src/Lucene.Net.Tests/Analysis/TestReusableStringReader.cs:
##########
@@ -30,42 +30,38 @@ public virtual void Test()
         {
             char[] buf = new char[4];
 
-            using (ReusableStringReader reader = new ReusableStringReader())
-            {
-                Assert.AreEqual(-1, reader.Read());
-                Assert.AreEqual(-1, reader.Read(new char[1], 0, 1));
-                Assert.AreEqual(-1, reader.Read(new char[2], 1, 1));
-                //Assert.AreEqual(-1, reader.Read(CharBuffer.wrap(new 
char[2])));
+            ReusableStringReader reader = new ReusableStringReader();

Review Comment:
   `ReusableStringReader` is internal in Java, but we made it public.
   
   
https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/analysis/ReusableStringReader.java#L23
   
   I tried changing it to internal and it still compiles, so let's mark this 
internal (breaking change).
   
   This is the second accessibility issue we found recently in the Lucene.Net 
assembly. I am pretty sure I went over every member to manually ensure we 
matched accessibility, but this was such a daunting task that it put me to 
sleep a few times. Not to mention, making some things internal wasn't possible 
at first because there were other things blocking it that may no longer be 
blocking. I asked ChatGPT to do it, but it suggested comparing the metadata for 
the API docs in DocFx and javadoc. 
https://chatgpt.com/share/6735ea5d-5ecc-8005-98df-cf3c65e4795c. It seems like a 
bit of work, but it would be considerably less work than doing the task 
manually again. So, I think we should have a new issue for this. It is probably 
worth it if we can minimize the number of APIs that we have declared public in 
the release. Plus if we have a tool that we can use to compare accessibility 
across a whole assembly, it would make keeping this in sync simpler for the 
long term.



##########
src/Lucene.Net.Tests/Codecs/Compressing/TestCompressingStoredFieldsFormat.cs:
##########
@@ -35,6 +35,7 @@ namespace Lucene.Net.Codecs.Compressing
     using MockAnalyzer = Lucene.Net.Analysis.MockAnalyzer;
     using RandomIndexWriter = Lucene.Net.Index.RandomIndexWriter;
 
+    // LUCENENET: Moved @Repeat(iterations=5) to the method level

Review Comment:
   Good catch.
   
   However, this is not equivalent to Lucene, being that there are tests in 
`BaseStoredFieldsFormatTestCase` and `BaseIndexFileFormatTestCase` that should 
also be repeated 5 times for this codec. Please move the `[Repeat(5)]` 
attribute back to the class level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Test review A-D, #259 [lucenenet]

Reply via email to