paulirwin commented on PR #1060: URL: https://github.com/apache/lucenenet/pull/1060#issuecomment-3222411253
Review from Claude Code: ## Pull Request Review: Issue #832 - Avoid Unnecessary FileInfo/DirectoryInfo Allocations ### **Summary** This PR optimizes file system operations throughout the Lucene.NET codebase by replacing `FileInfo` and `DirectoryInfo` object allocations with direct string path operations where appropriate. This is a performance optimization that reduces memory allocations and GC pressure. ### **✅ Strengths** 1. **Consistent Performance Improvement**: The changes systematically replace temporary `FileInfo`/`DirectoryInfo` allocations with string paths, reducing object creation overhead. 2. **Proper Documentation**: All changes are well-documented with clear comments explaining the rationale (`// LUCENENET specific: changed to use string ... instead of allocating a FileInfo (#832)`). 3. **Backwards Compatibility**: The PR maintains existing public APIs by providing overloads, ensuring no breaking changes for consumers. 4. **Scope Appropriateness**: Changes are focused on internal implementations and areas where the objects were only used to get path strings. ### **⚠️ Areas of Concern** 1. **Redundant Method Chaining**: Several methods now just delegate to other overloads (e.g., in `HyphenationTree.cs` and `PatternParser.cs`). While this reduces allocations, it creates a chain of method calls that could be simplified. 2. **Missing Null Checks**: Some string path parameters don't appear to have explicit null validation where the original `FileInfo` parameters would have thrown `ArgumentNullException`. 3. **Potential Path Resolution Differences**: `FileInfo.FullName` normalizes paths differently than raw string paths. Need to ensure this doesn't introduce subtle behavioral changes. ### **🔍 Specific Observations** 1. **IOUtils.cs**: New overloads for `GetDecodingReader` methods properly handle string paths and reduce allocations. 2. **JaspellTernarySearchTrie.cs**: The expanded file reading logic with proper resource disposal is an improvement over the original implementation. 3. **Store Directory Classes**: Changes to `SimpleFSDirectory`, `NIOFSDirectory`, and `MMapDirectory` consistently use `Path.Combine` instead of creating `FileInfo` objects. 4. **Test File**: The test modification in `TestFilesystemResourceLoader.cs` correctly updates to match the new constructor signature. ### **📋 Recommendations** 1. **Add Null Validation**: Consider adding explicit null checks for string path parameters to match the behavior of `FileInfo` constructors. 2. **Consider Path Normalization**: Verify that switching from `FileInfo.FullName` to raw string paths doesn't cause issues with relative paths or path separators. 3. **Performance Testing**: Run benchmarks to quantify the performance improvements, especially in scenarios with high file I/O operations. 4. **Documentation Update**: Update XML documentation for methods that now accept string paths to clarify any behavioral differences. ### **Verdict: ✅ APPROVED with minor suggestions** This PR successfully addresses the goal of reducing unnecessary allocations while maintaining API compatibility. The changes are well-documented and follow a consistent pattern. The minor suggestions above would further improve robustness but are not blocking issues. --- As a result of those suggestions, I did enable nullable reference type checking on StopAnalyzer and StopwordAnalyzerBase to resolve that concern. "Redundant method chaining" is to keep with matching upstream code in many cases, and "Potential path resolution differences" does not seem to have much of an impact for the cases that CC came up with (i.e. mixing slashes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org