RE: Problems with patch for LUCENENET-106

Timothy Januario Wed, 31 Dec 2008 06:44:52 -0800

Luc,
I'm not sure that I completely agree with your assessment of the WeakHashMap.


It does assume that the hash value is unique which is a correct usage.  It is 
the responsibility of object.GetHashCode() to return a unique value.  This is a 
problem in any Hashtable implementation and although I haven't seen the 
internal java code, I assume that it has the same limitation.  The object would 
be the key only superficially as the hashcode would be used internally as well 
(based on the name of the object, WeakHashMap, anyway).  If this is incorrect, 
I apologize for the noise.  I did, however, notice that GetHashCode() is not 
overridden in the IndexReader and since the .NET framework does not guarantee a 
unique value (according to the documentation for oject.GetHashCode()), this 
could present the problem that you have identified.  Is there a need to 
guarantee uniqueness by overriding the method or does anyone know if the value 
is always unique already?  When the Cache object was implemented with Hashtable 
prior to this patch, the same problem would have been inherent with that 
implementation (of course with the additional bonus of the memory leak ;)) and 
I don't think I ever saw collisions although quite honestly I was never really 
looking for them.

Also, you said that the call to WeakHashMap.Keys could return NULL values.  
This may be the reason why the Java implementation supports null keys.  It does 
seem to be a possibility unless the call to Keys checks that 
WeakReference.Key.Target is not null at the time of iteration (this would 
eliminate this possibility since we would now have a strong reference to the 
object ie: object key = entries[i].Key.Target; if (object != null) 
keys.Add(object);).  The same problem would exist with the Values property.

Also, optimizing for few IndexReaders in the WeakHashMap is an assumption that 
specializes the class for IndexReaders.  There is no guarantee that this will 
be the only use case in the future.  The Java documentation says that the same 
initial parameters (loadfactor and capacity) as HashMap are used which suggests 
that it should be left in the .NET implementation with the same parameters as 
the default Hashtable which is its underlying container.

Comments?
-tim

-----Original Message-----
From: Vanlerberghe, Luc [mailto:luc.vanlerber...@bvdep.com] 
Sent: Wednesday, December 31, 2008 4:48 AM
To: lucene-net-dev@incubator.apache.org
Subject: Problems with patch for LUCENENET-106

Hi,

I reviewed the fix for LUCENENET-106 and I see a couple of issues:

1. The implementation of WeakHashMap in SupportClass.cs has a bug in that it 
assumes all keys will have a unique HashCode.  Although the chances of this 
occurring are *very* small, it cannot be guaranteed. A collision would mean 
that the cached value for one indexReader could be returned as cached value for 
another indexReader (using the way this class is used in Lucene as example)

2. This class attempts to be a complete port of the java.util.WeakHashMap, but:
- A correct implementation is difficult to implement, and .Net does not have an 
equivalent of java.lang.ref.ReferenceQueue to accelerate the cleanup process.
- Since the behaviour of the WeakHashMap is closely tied to the behaviour of 
the garbage collector, it is very difficult to test properly, if at all.
- Lucene only uses the get(Object key) and put(K key, V value) methods
- In Lucene, the keys are IndexReader instances.  In normal use cases there 
will be only 1 IndexReader live, and perhaps another one if 'warming up' is 
used before switching IndexSearchers after an index update.
- If this class is presented as a complete port of java.util.WeakHashMap, users 
might assume this is production quality code and copy/paste its code in their 
own projects.  Any further bugs found might harm the credibility of the Lucene 
project, even if those sections are never used in Lucene. (e.g.: The collection 
returned by Keys might contain null values.  There's no guarantee the GC won't 
intervene between the call to Cleanup and the calls to keys.Add(w.Key.Target). 

3. Most support classes that are needed for the conversion of java to .Net are 
in the anonymous namespace, but this one is in Lucene.Net.Util.  I would 
propose to keep the namespaces corresponding to the original java packages 
clean and either put all support classes in the anonymous namespace or in a 
Lucene.Net specific one (Lucene.Net.DotNetPort ?) (I see that George moved it 
already to SupportClass, thanks George!)

I have been using the java version of Lucene in a project for a long time.
For a new project that is currently under development, we will be using the 
.Net version.
For now we are using the 2.1 version using a custom patch to avoid the memory 
leak problem.

I'll post an up to date version of this patch as a replacement for the current 
implementation of Lucene.Net.Util.WeakHashMap sometime next week...
- It doesn't try to be a complete implementation of WeakHashMap: only the 
methods strictly necessary are implemented, all other methods throw a 
NotImplementedException. (It should probably be renamed to 
SimplifiedWeakHashMap or something)
- It's optimized for the 'low number of keys' case.
- It's simple code, but I want to test it thoroughly first.  The original patch 
was put directly in the FieldCacheImpl code, I want to make sure I didn't make 
some stupid mistake while converting it.

I have an account on Jira, but I don't have rights to reopen the issue...

Regards,

Luc


-----Original Message-----
From: Digy (JIRA) [mailto:j...@apache.org] 
Sent: zaterdag 27 december 2008 19:47
To: lucene-net-dev@incubator.apache.org
Subject: [jira] Closed: () Lucene.NET (Revision: 603121) is leaking memory


     [ 
https://issues.apache.org/jira/browse/LUCENENET-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy closed LUCENENET-106.
--------------------------

    Resolution: Fixed
      Assignee: Digy

Patches are applied.

DIGY.

> Lucene.NET (Revision: 603121) is leaking memory
> -----------------------------------------------
>
>                 Key: LUCENENET-106
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-106
>             Project: Lucene.Net
>          Issue Type: Bug
>         Environment: .NET 2.0
>            Reporter: Anton K.
>            Assignee: Digy
>            Priority: Critical
>         Attachments: DIGY-FieldCacheImpl.patch, Digy.rar, 
> luceneSrc_memUsage.patch, Paches for v2.3.1.rar, 
> WeakHashTable+FieldCacheImpl.rar, WeakReferences.rar
>
>
> readerCache Hashtable field (see FieldCacheImpl.cs) never releases some hash 
> items that have closed IndexReader object as a key. So a lot of Term 
> instances are never released.
> Java version of Lucene uses WeakHashMap and therefore doesn't have this 
> problem.
> This bug can be reproduced only when Sort functionality used during search. 
> See following link for additional information.
> http://www.gossamer-threads.com/lists/lucene/java-user/55681
> Steps to reproduce:
> 1)Create index
> 2) Modify index by IndexWiter; Close IndexWriter
> 3) Use IndexSearcher for searching with Sort; Close InexSearcher
> 4) Go to step 2
> You'll get OutOfMemoryException after some time of running this algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Problems with patch for LUCENENET-106

Reply via email to