[GitHub] lucenenet issue #188: Fixed 64 Failing Facet Tests and Finished Facet Implem...

NightOwl888 Thu, 29 Sep 2016 03:58:57 -0700

Github user NightOwl888 commented on the issue:

https://github.com/apache/lucenenet/pull/188

Thanks.

I am all for making this code run as efficiently and fast as possible, and
agree there is room for improvement in the LRUHasMap. However, we are not
inventing something new here, this is a port of an existing application and
therefore must exhibit the same behavior. If you require a cache with different
behavior than what is built-in, you can [implement your own
ITaxonomyWriterCache](https://github.com/NightOwl888/lucenenet/blob/facet-bugz/src/Lucene.Net.Tests.Facet/Taxonomy/Directory/TestDirectoryTaxonomyWriter.cs#L275-L296).
But without passing tests we are missing our mark for Lucene.Net.

The [documentation
comments](https://github.com/apache/lucene-solr/blob/8fdf89690404c0e65784b2c5477552b9dec58591/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/LRUHashMap.java#L24-L48)
and supporting tests are very clear on how the LRUHashMap is supposed to
behave. There is not supposed to be any bulk removal of elements, they are
removed one by one as new elements are added. We only remove the *least
recently removed* element. There are no expiration dates or number of accesses
to contend with.

I took a look at your implementation and it looks very similar to the
original code that I changed to the current implementation. I am not sure why
there is "Usage" being tracked in there, but it is certainly not something that
is required to determine which element was *used least recently*. We don't
really even need a date or timestamp for that - just an `int` counter would be
enough. Also, we don't seem to be gaining anything from using
ConcurrentDictionary since the only multi-step actions it supports are
inadequate for LRU, so we would need locking anyway.

But my current implementation could also use some work. I think that it
might work best to break this up into 2 or more internal data structures -
perhaps one SortedSet and one Dictionary. I suspect that adding an item to an
existing SortedSet in the right position is far more efficient than sorting a
*whole* set. Then when we come along to delete the *least recent* item, it is
simply a matter of getting the key of the item from the top of the SortedSet
and using that key in Dictionary.Delete.

Perhaps it could be smarter than that, too. There could be a queue and some
internal worker thread that manages the updates as well as the deletes in the
background, but I am not sure if that would pass our tests, since they need to
see the changes immediately.

Anyway, this is worth another pass, which is why I noted it above.

BTW, if you *really* want to thank me for doing this, please spend a
weekend porting one of the remaining sections that doesn't have an open pull
request. There are still a few sections that don't have any large library
dependencies that would also need to be ported. Many of them were already
ported in Lucene 3.0.3, which is a big help if you are stuck on trying to
figure out how to make the pieces fit. We are almost there, but that doesn't
mean we still don't need help!




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] lucenenet issue #188: Fixed 64 Failing Facet Tests and Finished Facet Implem...

Reply via email to