Github user NightOwl888 commented on the issue:

    https://github.com/apache/lucenenet/pull/188
  
    Thanks.
    
    I am all for making this code run as efficiently and fast as possible, and 
agree there is room for improvement in the LRUHasMap. However, we are not 
inventing something new here, this is a port of an existing application and 
therefore must exhibit the same behavior. If you require a cache with different 
behavior than what is built-in, you can [implement your own 
ITaxonomyWriterCache](https://github.com/NightOwl888/lucenenet/blob/facet-bugz/src/Lucene.Net.Tests.Facet/Taxonomy/Directory/TestDirectoryTaxonomyWriter.cs#L275-L296).
 But without passing tests we are missing our mark for Lucene.Net.
    
    The [documentation 
comments](https://github.com/apache/lucene-solr/blob/8fdf89690404c0e65784b2c5477552b9dec58591/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/LRUHashMap.java#L24-L48)
 and supporting tests are very clear on how the LRUHashMap is supposed to 
behave. There is not supposed to be any bulk removal of elements, they are 
removed one by one as new elements are added. We only remove the *least 
recently removed* element. There are no expiration dates or number of accesses 
to contend with.
    
    I took a look at your implementation and it looks very similar to the 
original code that I changed to the current implementation. I am not sure why 
there is "Usage" being tracked in there, but it is certainly not something that 
is required to determine which element was *used least recently*. We don't 
really even need a date or timestamp for that - just an `int` counter would be 
enough. Also, we don't seem to be gaining anything from using 
ConcurrentDictionary since the only multi-step actions it supports are 
inadequate for LRU, so we would need locking anyway.
    
    But my current implementation could also use some work. I think that it 
might work best to break this up into 2 or more internal data structures - 
perhaps one SortedSet and one Dictionary. I suspect that adding an item to an 
existing SortedSet in the right position is far more efficient than sorting a 
*whole* set. Then when we come along to delete the *least recent* item, it is 
simply a matter of getting the key of the item from the top of the SortedSet 
and using that key in Dictionary.Delete.
    
    Perhaps it could be smarter than that, too. There could be a queue and some 
internal worker thread that manages the updates as well as the deletes in the 
background, but I am not sure if that would pass our tests, since they need to 
see the changes immediately.
    
    Anyway, this is worth another pass, which is why I noted it above.
    
    BTW, if you *really* want to thank me for doing this, please spend a 
weekend porting one of the remaining sections that doesn't have an open pull 
request. There are still a few sections that don't have any large library 
dependencies that would also need to be ported. Many of them were already 
ported in Lucene 3.0.3, which is a big help if you are stuck on trying to 
figure out how to make the pieces fit. We are almost there, but that doesn't 
mean we still don't need help!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to