GitHub user NightOwl888 opened a pull request:

    https://github.com/apache/lucenenet/pull/193

    Completed Grouping Implementation

    This adds the remaining Grouping classes, the Grouping tests, as well as 
the support classes (`TreeSet` and `LinkedHashMap`) required by Grouping. 12/12 
test are passing.
    
    There is also a sweeping bug fixed in the test framework - the 
`TestUtil.RandomRealisticUnicodeString()` method was returning only random 
series of numbers, not Unicode characters. Fortunately, the impact wasn't as 
bad as I thought it would be - there are ~6 additional tests solution-wide 
failing due to this fix.
    
    I am posting this as a pull request rather than pushing it to master for 2 
reasons:
    
    1. I am hoping it helps get #191 merged to master sooner (@conniey, let me 
know if it would be more helpful or harmful to merge this before you finish the 
.NET core integration).
    2. There are a few things about this that need more work and further 
discussion.
    
    API Refactoring
    -----
    
    Due to the liberal use of Java wildcard generics, it was a challenge 
putting something together that would compile. One of the solutions I came up 
with was to use .NET covariance to fix some of the issues. Unfortunately, 
covariance in .NET is only supported on interfaces and the fact that the Lucene 
designers made everything into abstract classes didn't help. Basically, we now 
have abstract classes that are backed by another abstraction (interfaces).
    
    Furthermore, since the Collector abstraction is also an abstract class 
rather than an interface, it means these Collector interfaces cannot also be 
Collectors (which means lots of casting). This is because interfaces cannot 
inherit abstract classes. So, there are 3 different ways to go from here to get 
rid of most of the casting:
    
    1. Change the Collector abstract class into an interface
    2. Create an interface to back the Collector, similar to what was done for 
Grouping
    3. Use a different solution than covariant interfaces to mimic Java 
wildcard generics
    
    I am hoping someone can present another option than covariant interfaces 
that doesn't result in lots of casting, but that may not be a possibility. IMO, 
it would make more sense to change Collector into an interface (since it has no 
code and only public members) than to create a backing interface, which would 
help to avoid confusion over the double-abstraction.
    
    What I mean is that pretty much every place where Collector abstract class 
is used now would need to change to the Collector interface to ensure that 
covariant Collector interfaces can inherit Collector, thus making them into a 
Collector. Keeping a Collector abstract class around after it is essentially no 
longer called out anywhere (except in subclasses) might be cause for confusion.
    
    Also, it would be helpful if someone would work with Grouping and let us 
know of any other issues with the API that could be improved.
    
    C5's TreeSet & TreeMap
    ----
    
    I brought over the TreeSet and TreeDictionary from C5, along with the 
applicable tests. Unfortunately, this drags most of the C5 library along with 
it. We could alternatively reference it via NuGet, but there is not yet .NET 
core support (just an open [pull 
request](https://github.com/sestoft/C5/pull/50)).
    
    Also, despite being a "set", it does not implement `ISet`, which I have now 
done. It isn't a problem yet, but may eventually be needed.
    
    LinkedHashMap
    -----
    
    Unfortunately, the `LurchTable` that worked well for an LRU cache doesn't 
keep track of insertion order (even when you specify that option) and therefore 
doesn't replace `LinkedHashMap` as was advertised. So, I ended up modifying 
[this solution](http://qiita.com/matarillo/items/c09e0f3e5a61f84a51e2) (the 
second option) to be backed by `HashMap` rather than `Dictionary` so it can 
support `null` keys.
    
    `LinkedHashMap`'s main reason for being is that it keeps track of the 
insertion order of items. We have unwittingly replaced it with `Dictionary` in 
several places. The fact that the tests pass is happenstance - `Dictionary` 
doesn't guarantee insertion order, but if you add items only and don't delete 
and re-add, they will enumerate in insertion order. That said, there is nothing 
stopping someone from changing this behavior from one .NET version to another 
and we can't rely on it. We should go back and find all of the places where 
`LinkedHashMap` is used in Lucene where we are using Dictionary, replace, and 
test thoroughly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NightOwl888/lucenenet grouping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucenenet/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #193
    
----
commit 0844f41c1b17eb4ca9bc395987189cce44aa91b8
Author: Shad Storhaug <[email protected]>
Date:   2016-10-27T11:54:19Z

    Added LuceneNetSpecific attribute to Support.DataInputStream and 
Support.DataOutputStream tests

commit 5f198526ed04616007e8baa03480fa7df327c1b3
Author: Shad Storhaug <[email protected]>
Date:   2016-10-27T11:43:54Z

    Added TreeSet and TreeDictionary from C5 to the Support namespace

commit dc8a4b8e219aea297101eab2b790ffe7d9e6d86d
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T09:48:05Z

    Added unit tests for Core.Support.HashMap

commit a0f684de9dbf46a7f9a0119609318584b6c16011
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T13:28:27Z

    Core.Support.TreeSet: Implemented ISet<T>, since it was missing from the 
original C5 implementation

commit 63fa4caff3c10f2e9ad9c63c6b24043534f8a2a6
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T09:51:13Z

    Added object overrides to Core.Support.HashMap so it can be used as a 
dictionary key based on values within the dictionary rather than by reference 
equality.

commit 7723e26d74297b714fa3b67cef940f6072322296
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T12:52:23Z

    Added Core.Support.LinkedHashMap + tests for supporting guaranteed 
insertion order enumeration of an IDictionary structure.

commit d67ed2b2a602a6528fa990bd5636f7cba01546b1
Author: Shad Storhaug <[email protected]>
Date:   2016-11-07T19:18:54Z

    Revert "HACK: Added stubs for all tests subclasses of abstract test classes 
(with [Test] attributes) and commented the [Test] attributes in the abstract 
classes to keep the tests from running in the wrong context."
    
    This reverts commit 4dbc3590361814d13fae64c8d030820eb4987489.

commit a209d7183138d3f2179b03113626ce5708f6e835
Author: Shad Storhaug <[email protected]>
Date:   2016-11-07T19:19:34Z

    HACK: (3rd) Added stubs for all tests subclasses of abstract test classes 
(with [Test] attributes) and commented the [Test] attributes in the abstract 
classes to keep the tests from running in the wrong context.

commit 8af1037d09e94a62f5a9efae3e1ceba6f6ec0ada
Author: Shad Storhaug <[email protected]>
Date:   2016-11-02T16:05:15Z

    Fixed bug in Core.Util.Mutable.MutableValue - default value of Exists 
property was not being set to true.

commit 8cc1a3f443d1715919f61134cdf771b05c9500ef
Author: Shad Storhaug <[email protected]>
Date:   2016-11-02T16:06:08Z

    Added TODO to rename the Core.Search.CachingCollector.Cached property to 
IsCached.

commit 9d72bcb3469dedd6bac66c3ee82bfc38e80e0eba
Author: Shad Storhaug <[email protected]>
Date:   2016-10-27T07:07:36Z

    WIP on Grouping

commit 081ce8c35473e96025d5631ffa929f94847ecf18
Author: Shad Storhaug <[email protected]>
Date:   2016-11-03T11:18:25Z

    Added StringBuilder.AppendCodePoint() extension method to 
Core.Support.StringBuilderExtensions + added unit tests. Fixed buggy 
StringBuilder.Reverse() method to account for Unicode.

commit c3abdc7398d0ac1d0c09dc8b30362d62b99eefe4
Author: Shad Storhaug <[email protected]>
Date:   2016-11-03T11:20:03Z

    Fixed bug in TestFramework.Util.TestUtil.RandomRealisticUnicodeString() 
method that wasn't converting the code points into characters (so we were just 
getting random sequences of numbers instead of Unicode).

commit b689479e06857cd40106e632136d9b1dd7c3cd4d
Author: Shad Storhaug <[email protected]>
Date:   2016-10-27T11:45:26Z

    Fixed namespace issue in Join.ToParentBlockJoinCollector

commit a8f7f42aedd968ae73a4bc1be3b55294ec66521f
Author: Shad Storhaug <[email protected]>
Date:   2016-10-27T11:46:13Z

    Completed implementation of the Grouping.AbstractGroupFacetCollector using 
the TreeSet.

commit 0a137bea5285ac0022861b67339665ace2ae9f49
Author: Shad Storhaug <[email protected]>
Date:   2016-11-02T13:37:20Z

    Added interfaces to GroupDocs, SearchGroup, and TopGroups to apply 
covariance so they act more like the wildcard generics that were used in Java.

commit 28cd391216597828674b90ad477966326a5e34f9
Author: Shad Storhaug <[email protected]>
Date:   2016-11-02T16:08:22Z

    Ported Grouping.GroupingSearchTest and fixed several bugs.

commit 7c2c58138999bb31375bccba3e0d8474402e2e83
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T13:24:15Z

    Fixed compile issues with Join.TestBlockJoin after applying IGroupDocs in 
place of GroupDocs on TopGroups.

commit f08a0e8d7059d507bfad62add71ed073904ab6c0
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T13:36:05Z

    Added CopyTo implementation in LinkedHashMap.KeyCollection and 
ValueCollection

commit fa5f44047ccf406b17dd49530009007de0367462
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T13:37:29Z

    Implemented IComparable in Core.Util.Mutable.MutableValue, IComparable<T> 
does not automatically bring it along.

commit 44c29eb8921d0c3d08493de6e2d3333b51aebe53
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T13:38:20Z

    Ported Grouping.DistinctValuesCollectorTest

commit 4b914b7231fd7e4029fea2108405f3c5d09df398
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T18:20:40Z

    Ported Grouping.TestGrouping

commit 7c01e8865290dae7f3367c6e9fb7c00d8d76f8a6
Author: Shad Storhaug <[email protected]>
Date:   2016-11-05T18:39:42Z

    Fixed random bug in SearchGroup - replaced SortedSet with TreeSet

commit 840c67b6562a29979ed4b5572ebaac1a4dd392b8
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T13:29:33Z

    Fixed several string comparison bugs in Grouping tests

commit 95bb669287a986c676b6e4ecf5f95d48341cacf1
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T14:47:59Z

    Renamed AbstractGroupHead to AbstractAllGroupHeadsCollector_GroupHead so it 
is similar to its original name, AbstractAllGroupHeadsCollector.GroupHead. Due 
to conflicts with other classes named GroupHead in derived classes of 
AbstractAllGroupHeadsCollector, the original name could not be used.

commit 41b9732e9a11f3ad04f8633ce47857c7c98e18e0
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T15:19:37Z

    Removed superfluous IGroupCount interface. Created non-generic 
AbstractDistinctValuesCollector class to nest IGroupCount and GroupCount so the 
syntax is similar to Lucene.

commit dd24cf3a0e30c963393939a1b5ba45e009c709d0
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T16:15:52Z

    Grouping: Removed commented code.

commit 5c9a388e098bdcc25fda89395ab8449f1480d32a
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T16:22:38Z

    Added missing fail() in Grouping.TestGrouping test

commit 3881180835b2c605b0fc587c54f2676c7f521f1e
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T16:31:51Z

    Cleaned up usings in Grouping.

commit d0838cc3706f93a36557e1aa783f0af523aeace2
Author: Shad Storhaug <[email protected]>
Date:   2016-11-06T16:51:21Z

    Grouping: Test code cleanup to verify API

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to