NightOwl888 opened a new issue, #1059: URL: https://github.com/apache/lucenenet/issues/1059
### Is there an existing issue for this? - [X] I have searched the existing issues ### Task description There were several workarounds done during the initial port of Lucene.Net.Grouping that make it less usable and even less performant than the grouping module in Lucene 4.8.1. 1. It is using `IEnumerable<T>` in places where Lucene used the Java equivalent of `ICollection<T>` 2. Since we are using `IEnumeable<T>`, we have to use LINQ `Count()` and `Any()` in a few places 3. We are using non-generic collection interfaces to get around the fact that .NET doesn't support wildcard generics 4. The `Collector` abstract class in Lucene.Net.Queries was replaced with an `ICollector` interface to make it covariant The fundamental problem with Lucene.Net.Grouping is that C# doesn't support wildcard generics and the Lucene design requires them. When I ported it, it took several tries just to find something that would compile and this is essentially our first (failed) attempt. It functions, but it has problems that rippled throughout the search namespace. It has bottlenecks that didn't exist in Java where we use LINQ and extra collection allocations. These bottlenecks will not be fixable after the Lucene.NET 4.8.0 release because they require breaking API changes, both to Lucene.Net.Queries and Lucene.Net.Grouping. Through the high-level analysis work that @rclabo did, we learned that the crux of the issue is with the `GroupingSearch` class. It is a God Object that the user can see all of the grouping options on. But since it has fields that need to be multiple different generic types, it doesn't work in C#. Ron refactored the search method into different types of search, but I think it still requires covariance to exist. That is where we are today. I suspect we can fix all of the incompatibles by correcting the God Object design and following the Single Responsibility Principle. There are 3 different types of grouping searches (each defined by a separate constructor): - Group by Field - groups documents by index terms using the `FieldCache` - Group by Function - groups documents by function using a `ValueSource` - Group by Doc Block - can only be used when documents belonging in a group are indexed in one block My thought is to start by refactoring the `GroupingSearch` class into `FieldGroupingSearch`, `FunctionGroupingSearch`, and `DocBlockGroupingSearch`, separating all of the fields into their respective classes. If there is any shared functionality remaining, create a new class named `AbstractGroupingSearch` to put this shared state and members in. This removes the need to have anything resembling Java wildcard generics because we will have separate classes with their own generic types. This will ripple changes through the Lucene.Net.Grouping project, but we should aim to remove the extra interfaces we created just to be able to declare types as covariant (which was a workaround we used for C#'s lack of wildcard generics). - `IAbstractAllGroupsCollector<out TGroupValue>` - `AbstractDistinctValuesCollector.IGroupCount<out TGroupValue>` - `IAbstractDistinctValuesCollector<out GC>` - `IAbstractFirstPassGroupingCollector<out TGroupValue>` - `IAbstractSecondPassGroupingCollector<out TGroupValue>` - `IGroupDocs<out TGroupValue>` - `ISearchGroup<out TGroupValue>` - `ITopGroups<out TGroupValue>` - `ICollector` (in Lucene.Net.Queries). Note that we could keep this interface, but we should use the original `Collector` abstract class design. It has a constant declaration that cannot be put into an interface, so we have to have a class, anyway. > NOTE: We may still require non-generic interfaces to work around issues with calling members on generic types without referring to a generic closing type, but the goal here is to remove the need for covariance. Once we remove the interfaces, the violations to the SRP, and fix the 4 issues above, we can then use the class `GroupingSearch` to either delegate the call to the specialized `GroupingSearch` classes (preferred - this would give us a similar UX as Lucene) or to add static factory methods to create instances of them. This is going to require some creativity. I am not sure the former is possible because of the different return types for each search method (which was the primary reason we needed covariance to port it), but separating the functionality will likely reveal more options than we previously had. > The idea is that we will ultimately "put it all in one place" in `GroupingSearch` to make it somewhat similar to Lucene, even though all we are doing is delegating calls or creating a single jump point to assist with creating the right class instance to do the real work. ## Deliverables - [ ] Factor out the need for anything resembling Java wildcard generics - [ ] Remove the extra covariant interfaces - [ ] Restore `Collector` back to its original abstract class design as it was in Lucene 4.8.1 - [ ] Restore `IEnumerable<T>` declarations to `ICollection<T>` as they were in Lucene 4.8.1 - [ ] Remove the calls to LINQ methods on `IEnuerable<T>` and restore `ICollection<T>.Count` as it was in Lucene 4.8.1 - [ ] Restore generic collections where we are currently using `ICollection` and `IDictionary` - [ ] Remove extra collection allocations (see the `LUCENENET TODO: Try to work out how to do this without an O(n) operation` lines) - [ ] Create a UX as similar as possible to Lucene 4.8.1 and update the documentation to show usage -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org