NightOwl888 commented on code in PR #1154: URL: https://github.com/apache/lucenenet/pull/1154#discussion_r2051756865
########## src/Lucene.Net.Tests.Analysis.SmartCn/DictionaryTests.cs: ########## @@ -0,0 +1,72 @@ +using Lucene.Net.Util; +using Lucene.Net.Analysis.Cn.Smart.Hhmm; +using Lucene.Net.Attributes; +using NUnit.Framework; +using System; +using System.IO; +using System.Reflection; + + +[TestFixture] +[LuceneNetSpecific] +public class DictionaryTests : LuceneTestCase Review Comment: Actually, there is no reason to add a reference on the J2N package because it is already referenced by the `Lucene.Net` assembly. All submodules depend on `Lucene.Net` and J2N is a transitive (implicit) dependency that is loaded automatically. As for the extension method, there is a guide [here](https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/how-to-implement-and-call-a-custom-extension-method) that demonstrates how to call them. You need to add the `using` directive of the namespace where the class the extension method is defined in is located. In this case, the `J2N` namespace. ```c# using J2N; ``` > TIP: If you are using Visual Studio 2022, it will display a red underline when it doesn't recognize a method call and if you hover over the underlined text, it will suggest options such importing the namespace for the extension method and show namespaces where extension methods that fit the method signature are located. ## FindAndGetManifestResourceStream BTW - I made a couple of errors when I provided instructions previously. First of all, the method is named `FindAndGetManifestResourceStream()`, not `FindAndGetResourceStream()`. Secondly, for the `FindAndGetManifestResourceStream()` method to function, the namespace and directory structure that the resource is in must match exactly *or* the best match with the assembly namespace prefix and the name suffix passed into `FindAndGetManifestResourceStream()` used must be resolvable to the complete name. So, the simplest approach here would be to put the `DictionaryTests` class and resource files in the `Hhmm` directory instead of the `Support/Hhmm` directory, otherwise we would need to hard code `Support.` into the path of the resource for it to be found. This is the approach we took with other tests that we added, such as [`TestBuildDictionary.cs` in the kuromoji module](https://github.com/apache/lucenenet/blob/a0578d6ea2c17c06be925adb15acd3c64d5fc824/src/Lucene.Net.Tests.Analysis.Kuromoji/Tools/TestBuildDictionary.cs). The code snippet above looks great! There are only 6 changes I would suggest: ### Change 1 We will need to add the line `base.OneTimeSetUp();` before executing anything inside of the `OneTimeSetUp()` method (a detail I forgot to mention). This is required by `LuceneTestCase` to function correctly. ```c# public override void OneTimeSetUp() { base.OneTimeSetUp(); tempDir = CreateTempDir("smartcn-data"); AnalyzerProfile.ANALYSIS_DATA_DIR = tempDir.FullName; CopyResourceToFile(BigramFileName); CopyResourceToFile(CoreFileName); } ``` ### Change 2 It is slightly more efficient to call the method using the `typeof()` operator instead of `this.GetType()`. ```c# using var resourceStream = typeof(DictionaryTests).FindAndGetManifestResourceStream(resourceName); ``` > TIP: To determine the resource path that the `FindAndGetManifestResourceStream(resourceName)` method found, there is another extension method named `FindResource(string)` that executes the same logic and returns a string with the .NET resource path rather than a `Stream`. ### Change 3 Please eliminate the variables and inline the calls to `Assert.AreEqual()` in the `WordDictionary` tests as was the case in `TestBigramDictionary()`. ```c# WordDictionary wordDict = WordDictionary.GetInstance(); Assert.AreEqual(30, wordDict.GetFrequency("尼".ToCharArray()), "Frequency for '尼' is incorrect."); Assert.AreEqual(0, wordDict.GetFrequency("missing".ToCharArray()), "Expected frequency 0 for unknown word."); ``` ### Change 4 Please rename the test methods to be consistent. Let's change the name from `TestWordDictionaryGetInstance()` to `TestWordDictionary()`. ### Change 5 To be consistent with kuromoji, let's change the name of the test class (and file) to `TestBuildDictionary`. As I previously mentioned, leave the namspace as `Lucene.Net.Analysis.Cn.Smart.Hhmm` but place the file in the `Hhmm` directory along with the resource files. I tested `FindAndGetManifestResourceStream()` and it will locate them with these conventions. I will leave it up to you whether to put the `.dict` files into a `.zip` file and use `TestUtil.Unzip()` to copy them instead of using individual files and a custom copy method. The [`TestBuildDictionary.cs` in the kuromoji module](https://github.com/apache/lucenenet/blob/a0578d6ea2c17c06be925adb15acd3c64d5fc824/src/Lucene.Net.Tests.Analysis.Kuromoji/Tools/TestBuildDictionary.cs) has an example of the zipped approach, which can be accomplished with less code. ### Change 6 Note that we are specifically testing *building* the dictionaries. Most users will likely use the built-in ones, but this feature allows customization. When `ANALYSIS_DATA_DIR` is set, the first call to `GetInstance()` will build the dictionary and then call `SaveToObj()` which creates the `.mem` file (a binary file) in the `ANALYSIS_DATA_DIR`. After that point, the `.dict` file can be deleted as all future calls to `GetInstance()` will load the data from the binary `.mem` file. As such, let's add additional asserts in each test to make sure that the `.mem` file is created in the temp folder and that subsequent calls to `GetInstance()` can still load the data from the `.mem` file even after the `.dict` file has been deleted from the temp folder. Repeating the same asserts should be sufficient for this test, which can be done by putting them into a separate method that is called multiple times (once for each call to `GetInstance()`). Name these methods `CheckBigramDictionary()` and `CheckWordDictionary()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org