NehanPathan opened a new pull request, #1154: URL: https://github.com/apache/lucenenet/pull/1154
--- ### π― **Objective:** This pull request (PR) optimizes the SmartCn dictionary loading process and introduces unit tests to ensure correctness and maintainability. --- ### π₯ **Key Changes:** β **1. Dictionary Optimization:** - Replaced `ByteBuffer` with `BinaryReader.ReadInt32()` for faster and more efficient data reading. - Implemented `ReadOnlySpan<char>` to minimize memory usage and improve overall performance. β **2. Comprehensive Unit Tests Added:** - **Test File:** `DictionaryTests.cs` - Contains tests for loading dictionaries and verifying dictionary operations. - **BigramDictionary Tests:** - `GetInstance()` method to ensure correct singleton instantiation. - `LoadFromFile()` method to verify successful loading of the dictionary from `bigramDict.dct`. - `GetFrequency()` method to test frequency retrieval of valid and non-existent entries. - **WordDictionary Tests:** - `GetInstance()` to confirm proper instantiation. - Future tests can be added if `LoadMainDataFromFile()` becomes accessible (Currently it is private method). β **3. Resource Files Added:** - **Location:** `Lucene.Net.Tests.Analysis.SmartCn.Resources` - `bigramDict.dct` - `coreDict.dct` β **4. Embedded Resource Loading:** - Embedded both `.dct` files as resources in the test assembly to eliminate external dependencies. - Created a utility in `LuceneTestCase` to extract these resources as temporary files during tests. --- ### π§ͺ **Testing Details:** π **Test Scenarios:** - Validated successful loading of both `bigramDict.dct` and `coreDict.dct` from embedded resources. - Checked frequency retrieval for valid entries (`hello`, `world`) and ensured non-existent entries return `0`. - Verified that the `GetInstance()` method returns a non-null singleton instance. β **Assertions Included:** - Frequency correctness for known entries. - Proper dictionary instantiation. - No regression in dictionary functionality. --- ### π **Why These Changes?** π‘ **Performance Improvements:** - Faster dictionary loading with reduced memory overhead. π‘ **Increased Test Coverage:** - Ensures that dictionary operations work correctly and efficiently. π‘ **Simplified Testing Workflow:** - Embedded resource handling eliminates file path dependencies. --- ### π **Future Considerations:** π **Testing `WordDictionary`:** - Currently limited to `GetInstance()` due to private access of `LoadMainDataFromFile()`. - Additional tests can be added when the methodβs visibility is updated. π **Performance Enhancements:** - Future work may include further performance optimization of dictionary lookups and hash collision handling. --- ### π **Issue Reference:** Fixes #1153 --- ### π **Checklist:** - [x] Read and followed the [[Contributor Guide](https://github.com/apache/lucenenet/blob/main/CONTRIBUTING.md)](https://github.com/apache/lucenenet/blob/main/CONTRIBUTING.md) and [[Code of Conduct](https://www.apache.org/foundation/policies/conduct.html)](https://www.apache.org/foundation/policies/conduct.html). - [x] Included relevant unit or integration tests. - [x] Added inline documentation where applicable. - [x] Created an open issue and linked it to this PR. --- ## π **How to Run Tests:** 1. Build the solution using `dotnet build`. 2. Run tests using `dotnet test` to verify that all dictionary operations work correctly. --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org