NightOwl888 commented on code in PR #1154:
URL: https://github.com/apache/lucenenet/pull/1154#discussion_r2051756865


##########
src/Lucene.Net.Tests.Analysis.SmartCn/DictionaryTests.cs:
##########
@@ -0,0 +1,72 @@
+using Lucene.Net.Util;
+using Lucene.Net.Analysis.Cn.Smart.Hhmm;
+using Lucene.Net.Attributes;
+using NUnit.Framework;
+using System;
+using System.IO;
+using System.Reflection;
+
+
+[TestFixture]
+[LuceneNetSpecific]
+public class DictionaryTests : LuceneTestCase

Review Comment:
   Actually, there is no reason to add a reference on the J2N package because 
it is already referenced by the `Lucene.Net` assembly. All submodules depend on 
`Lucene.Net` and J2N is a transitive (implicit) dependency that is loaded 
automatically.
   
   As for the extension method, there is a guide 
[here](https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/how-to-implement-and-call-a-custom-extension-method)
 that demonstrates how to call them. You need to add the `using` directive of 
the namespace where the class the extension method is defined in is located. In 
this case, the `J2N` namespace.
   
   ```c#
   using J2N;
   ```
   
   > TIP: If you are using Visual Studio 2022, it will display a red underline 
when it doesn't recognize a method call and if you hover over the underlined 
text, it will suggest options such importing the namespace for the extension 
method and show namespaces where extension methods that fit the method 
signature are located.
   
   ## FindAndGetManifestResourceStream
   
   BTW - I made a couple of errors when I provided instructions previously. 
   
   First of all, the method is named `FindAndGetManifestResourceStream()`, not 
`FindAndGetResourceStream()`.
   
   Secondly, for the `FindAndGetManifestResourceStream()` method to function, 
the namespace and directory structure that the resource is in must match 
exactly *or* the best match with the assembly namespace prefix and the name 
suffix passed into `FindAndGetManifestResourceStream()` used must be resolvable 
to the complete name. So, the simplest approach here would be to put the 
`DictionaryTests` class and resource files in the `Hhmm` directory instead of 
the `Support/Hhmm` directory, otherwise we would need to hard code `Support.` 
into the path of the resource for it to be found. This is the approach we took 
with other tests that we added, such as [`TestBuildDictionary.cs` in the 
kuromoji 
module](https://github.com/apache/lucenenet/blob/a0578d6ea2c17c06be925adb15acd3c64d5fc824/src/Lucene.Net.Tests.Analysis.Kuromoji/Tools/TestBuildDictionary.cs).
   
   The code snippet above looks great! There are only 6 changes I would suggest:
   
   ### Change 1
   
   We will need to add the line `base.OneTimeSetUp();` before executing 
anything inside of the `OneTimeSetUp()` method (a detail I forgot to mention). 
This is required by `LuceneTestCase` to function correctly.
   
   ```c#
           public override void OneTimeSetUp()
           {
               base.OneTimeSetUp();
               tempDir = CreateTempDir("smartcn-data");
               AnalyzerProfile.ANALYSIS_DATA_DIR = tempDir.FullName;
   
               CopyResourceToFile(BigramFileName);
               CopyResourceToFile(CoreFileName);
           }
   ```
   
   ### Change 2
   
   It is slightly more efficient to call the method using the `typeof()` 
operator instead of `this.GetType()`.
   
   ```c#
   using var resourceStream = 
typeof(DictionaryTests).FindAndGetManifestResourceStream(resourceName);
   ```
   
   > TIP: To determine the resource path that the 
`FindAndGetManifestResourceStream(resourceName)` method found, there is another 
extension method named `FindResource(string)` that executes the same logic and 
returns a string with the .NET resource path rather than a `Stream`.
   
   ### Change 3
   
   Please eliminate the variables and inline the calls to `Assert.AreEqual()` 
in the `WordDictionary` tests as was the case in `TestBigramDictionary()`.
   
   ```c#
   WordDictionary wordDict = WordDictionary.GetInstance();
   Assert.AreEqual(30, wordDict.GetFrequency("尼".ToCharArray()), "Frequency for 
'尼' is incorrect.");
   Assert.AreEqual(0, wordDict.GetFrequency("missing".ToCharArray()), "Expected 
frequency 0 for unknown word.");
   ```
   
   ### Change 4
   
   Please rename the test methods to be consistent. Let's change the name from 
`TestWordDictionaryGetInstance()` to `TestWordDictionary()`.
   
   ### Change 5
   
   To be consistent with kuromoji, let's change the name of the test class (and 
file) to `TestBuildDictionary`. As I previously  mentioned, leave the namspace 
as `Lucene.Net.Analysis.Cn.Smart.Hhmm` but place the file in the `Hhmm` 
directory along with the resource files. I tested 
`FindAndGetManifestResourceStream()` and it will locate them with these 
conventions.
   
   I will leave it up to you whether to put the `.dict` files into a `.zip` 
file and use `TestUtil.Unzip()` to copy them instead of using individual files 
and a custom copy method. The [`TestBuildDictionary.cs` in the kuromoji 
module](https://github.com/apache/lucenenet/blob/a0578d6ea2c17c06be925adb15acd3c64d5fc824/src/Lucene.Net.Tests.Analysis.Kuromoji/Tools/TestBuildDictionary.cs)
 has an example of the zipped approach, which can be accomplished with less 
code.
   
   ### Change 6
   
   Note that we are specifically testing *building* the dictionaries. Most 
users will likely use the built-in ones, but this feature allows customization. 
When `ANALYSIS_DATA_DIR` is set, the first call to `GetInstance()` will build 
the dictionary and then call `SaveToObj()` which creates the `.mem` file (a 
binary file) in the `ANALYSIS_DATA_DIR`. After that point, the `.dict` file can 
be deleted as all future calls to `GetInstance()` will load the data from the 
binary `.mem` file.
   
   As such, let's add additional asserts in each test to make sure that the 
`.mem` file is created in the temp folder and that subsequent calls to 
`GetInstance()` can still load the data from the `.mem` file even after the 
`.dict` file has been deleted from the temp folder. Repeating the same asserts 
should be sufficient for this test, which can be done by putting them into a 
separate method that is called multiple times (once for each call to 
`GetInstance()`). Name these methods `CheckBigramDictionary()` and 
`CheckWordDictionary()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to