NightOwl888 opened a new issue, #1059:
URL: https://github.com/apache/lucenenet/issues/1059

   ### Is there an existing issue for this?
   
   - [X] I have searched the existing issues
   
   ### Task description
   
   There were several workarounds done during the initial port of 
Lucene.Net.Grouping that make it less usable and even less performant than the 
grouping module in Lucene 4.8.1.
   
   1. It is using `IEnumerable<T>` in places where Lucene used the Java 
equivalent of `ICollection<T>`
   2. Since we are using `IEnumeable<T>`, we have to use LINQ `Count()` and 
`Any()` in a few places
   3. We are using non-generic collection interfaces to get around the fact 
that .NET doesn't support wildcard generics
   4. The `Collector` abstract class in Lucene.Net.Queries was replaced with an 
`ICollector` interface to make it covariant
   
   The fundamental problem with Lucene.Net.Grouping is that C# doesn't support 
wildcard generics and the Lucene design requires them. When I ported it, it 
took several tries just to find something that would compile and this is 
essentially our first (failed) attempt. It functions, but it has problems that 
rippled throughout the search namespace. It has bottlenecks that didn't exist 
in Java where we use LINQ and extra collection allocations. These bottlenecks 
will not be fixable after the Lucene.NET 4.8.0 release because they require 
breaking API changes, both to Lucene.Net.Queries and Lucene.Net.Grouping.
   
   Through the high-level analysis work that @rclabo did, we learned that the 
crux of the issue is with the `GroupingSearch` class. It is a God Object that 
the user can see all of the grouping options on. But since it has fields that 
need to be multiple different generic types, it doesn't work in C#. Ron 
refactored the search method into different types of search, but I think it 
still requires covariance to exist. That is where we are today.
   
   I suspect we can fix all of the incompatibles by correcting the God Object 
design and following the Single Responsibility Principle. There are 3 different 
types of grouping searches (each defined by a separate constructor):
   
   - Group by Field - groups documents by index terms using the `FieldCache`
   - Group by Function - groups documents by function using a `ValueSource`
   - Group by Doc Block - can only be used when documents belonging in a group 
are indexed in one block
   
   My thought is to start by refactoring the `GroupingSearch` class into 
`FieldGroupingSearch`, `FunctionGroupingSearch`, and `DocBlockGroupingSearch`, 
separating all of the fields into their respective classes. If there is any 
shared functionality remaining, create a new class named 
`AbstractGroupingSearch` to put this shared state and members in. This removes 
the need to have anything resembling Java wildcard generics because we will 
have separate classes with their own generic types.
   
   This will ripple changes through the Lucene.Net.Grouping project, but we 
should aim to remove the extra interfaces we created just to be able to declare 
types as covariant (which was a workaround we used for C#'s lack of wildcard 
generics).
   
   - `IAbstractAllGroupsCollector<out TGroupValue>`
   - `AbstractDistinctValuesCollector.IGroupCount<out TGroupValue>`
   - `IAbstractDistinctValuesCollector<out GC>`
   - `IAbstractFirstPassGroupingCollector<out TGroupValue>`
   - `IAbstractSecondPassGroupingCollector<out TGroupValue>`
   - `IGroupDocs<out TGroupValue>`
   - `ISearchGroup<out TGroupValue>`
   - `ITopGroups<out TGroupValue>`
   - `ICollector` (in Lucene.Net.Queries). Note that we could keep this 
interface, but we should use the original `Collector` abstract class design. It 
has a constant declaration that cannot be put into an interface, so we have to 
have a class, anyway.
   
   > NOTE: We may still require non-generic interfaces to work around issues 
with calling members on generic types without referring to a generic closing 
type, but the goal here is to remove the need for covariance.
   
   Once we remove the interfaces, the violations to the SRP, and fix the 4 
issues above, we can then use the class `GroupingSearch` to either delegate the 
call to the specialized `GroupingSearch` classes (preferred - this would give 
us a similar UX as Lucene) or to add static factory methods to create instances 
of them. This is going to require some creativity. I am not sure the former is 
possible because of the different return types for each search method (which 
was the primary reason we needed covariance to port it), but separating the 
functionality will likely reveal more options than we previously had.
   
   > The idea is that we will ultimately "put it all in one place" in 
`GroupingSearch` to make it somewhat similar to Lucene, even though all we are 
doing is delegating calls or creating a single jump point to assist with 
creating the right class instance to do the real work.
   
   ## Deliverables
   
   - [ ] Factor out the need for anything resembling Java wildcard generics
   - [ ] Remove the extra covariant interfaces
   - [ ] Restore `Collector` back to its original abstract class design as it 
was in Lucene 4.8.1
   - [ ] Restore `IEnumerable<T>` declarations to `ICollection<T>` as they were 
in Lucene 4.8.1
   - [ ] Remove the calls to LINQ methods on `IEnuerable<T>` and restore 
`ICollection<T>.Count` as it was in Lucene 4.8.1
   - [ ] Restore generic collections where we are currently using `ICollection` 
and `IDictionary`
   - [ ] Remove extra collection allocations (see the `LUCENENET TODO: Try to 
work out how to do this without an O(n) operation` lines)
   - [ ] Create a UX as similar as possible to Lucene 4.8.1 and update the 
documentation to show usage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to