RE: Problems with patch for LUCENENET-106

George Aroush Mon, 05 Jan 2009 16:47:24 -0800

The fix should be in the implementation / port of WeakHashMap, and not in
Lucene.Net base code.


I believe Luc is working on providing an alternative patch / fix.

-- George

> -----Original Message-----
> From: Timothy Januario [mailto:timothy.janua...@bdmetrics.com] 
> Sent: Monday, January 05, 2009 10:19 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Problems with patch for LUCENENET-106
> 
> I stand corrected.  I do see the possibility of the collision 
> as a potentially serious bug.  The java version still uses 
> the object as the key which means that if the hashcode is 
> identical across all instances, it just becomes less 
> efficient.  In the current .NET implementation, it stays very 
> efficient, it would just only ever have a single instance in 
> the cache which is a very different and incorrect result.
> 
> -----Original Message-----
> From: Eyal Post [mailto:eyalp...@epocalipse.com]
> Sent: Saturday, January 03, 2009 1:56 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: Problems with patch for LUCENENET-106
> 
> I tend to agree with Luc. 
> Hashtables don't rely on small chance for collisions to 
> operate correctly.
> They only rely on it to operate efficiently. 
> This means you can store 2 elements with the same hashcode in 
> a hashtable and they won't override each other, but the 
> hashtable lookups will be less efficient with collisions.
> The problem with code provided in the patch is that it uses 
> the hashcode as the key in the hashtable and altough the 
> chance of collisions is small once they happen the results 
> are undefined.
> I think the best way to solve that is to use WeakEntry as the 
> key and override both GetHashCode and Equals. GetHashCode 
> should return the HashCode field and Equals should compare the Keys
> 
> 
> Eyal Post
>  
> 
> > -----Original Message-----
> > From: George Aroush [mailto:geo...@aroush.net]
> > Sent: Friday, January 02, 2009 0:52 AM
> > To: lucene-net-dev@incubator.apache.org
> > Subject: RE: Problems with patch for LUCENENET-106
> > 
> > Hi Luc,
> > 
> > I have to agree with Tim.
> > 
> > The .NET's implementation / port of WeakHashMap in Lucene.Net is 
> > consistent with Java Lucene.  If there is a "very" small 
> chance of a 
> > hash key collusion, it is also possible in the Java version 
> of Lucene.  
> > Here is why.
> > 
> > In Java, when WeakHashMap.put(Object key, Object value) is called, 
> > this function calls key.hashCode() to get a hash value (I 
> had a look 
> > at Java's source code for WeakHashMap.) The same logic is 
> happening in 
> > .NET's implementation / port of WeakHashMap.  In addition, at least 
> > for now, the "key" is always of type IndexReader.  Looking in 
> > IndexReader code, there is no override of Java::hashCode() / 
> > C#::GetHashCode().
> >  So, unless if Java's
> > hashCode() guaranties uniqueness (which it doesn't), then the 
> > possibility of a hash value collusion also exists in the 
> Java version 
> > of Lucene.  If this is the case, then we have a valid 
> defect in Lucene 
> > and it should be addressed.
> > 
> > Erik, do you agree, can you comment?
> > 
> > Regarding #2, & #3, I agree that the .NET implementation of 
> > WeakHashMap could have been confined to just get() / put().
> > Since you already have a patch for Lucene.Net 2.1, please 
> submit a new 
> > JIRA issue and attach your patch to it (when you are 
> ready.)  I look 
> > forward to your patch.
> > 
> > Regards, and Happy New Year everyone!
> > 
> > -- George
> > 
> > 
> > > -----Original Message-----
> > > From: Timothy Januario [mailto:timothy.janua...@bdmetrics.com]
> > > Sent: Wednesday, December 31, 2008 9:44 AM
> > > To: lucene-net-dev@incubator.apache.org
> > > Subject: RE: Problems with patch for LUCENENET-106
> > > 
> > > Luc,
> > > I'm not sure that I completely agree with your assessment of the 
> > > WeakHashMap.
> > > 
> > > It does assume that the hash value is unique which is a
> > correct usage.  
> > > It is the responsibility of
> > > object.GetHashCode() to return a unique value.  This is a
> > problem in
> > > any Hashtable implementation and although I haven't seen
> > the internal
> > > java code, I assume that it has the same limitation.  The
> > object would
> > > be the key only superficially as the hashcode would be used
> > internally
> > > as well (based on the name of the object, WeakHashMap, anyway).
> > > If this is incorrect, I apologize for the noise.  I did, however, 
> > > notice that GetHashCode() is not overridden in the 
> IndexReader and 
> > > since the .NET framework does not guarantee a unique value
> > (according
> > > to the documentation for oject.GetHashCode()), this could
> > present the
> > > problem that you have identified.  Is there a need to guarantee 
> > > uniqueness by overriding the method or does anyone know if
> > the value
> > > is always unique already?  When the Cache object was
> > implemented with
> > > Hashtable prior to this patch, the same problem would have been 
> > > inherent with that implementation (of course with the
> > additional bonus
> > > of the memory leak ;)) and I don't think I ever saw collisions 
> > > although quite honestly I was never really looking for them.
> > > 
> > > Also, you said that the call to WeakHashMap.Keys could 
> return NULL 
> > > values.  This may be the reason why the Java implementation
> > supports
> > > null keys.  It does seem to be a possibility unless the
> > call to Keys
> > > checks that WeakReference.Key.Target is not null at the time of 
> > > iteration (this would eliminate this possibility since we 
> would now 
> > > have a strong reference to the object ie: object key = 
> > > entries[i].Key.Target; if (object != null) 
> keys.Add(object);).  The 
> > > same problem would exist with the Values property.
> > > 
> > > Also, optimizing for few IndexReaders in the WeakHashMap is an 
> > > assumption that specializes the class for IndexReaders.
> > > There is no guarantee that this will be the only use case in the 
> > > future.  The Java documentation says that the same initial
> > parameters
> > > (loadfactor and capacity) as HashMap are used which
> > suggests that it
> > > should be left in the .NET implementation with the same
> > parameters as
> > > the default Hashtable which is its underlying container.
> > > 
> > > Comments?
> > > -tim
> > > 
> > > -----Original Message-----
> > > From: Vanlerberghe, Luc [mailto:luc.vanlerber...@bvdep.com]
> > > Sent: Wednesday, December 31, 2008 4:48 AM
> > > To: lucene-net-dev@incubator.apache.org
> > > Subject: Problems with patch for LUCENENET-106
> > > 
> > > Hi,
> > > 
> > > I reviewed the fix for LUCENENET-106 and I see a couple of issues:
> > > 
> > > 1. The implementation of WeakHashMap in SupportClass.cs has
> > a bug in
> > > that it assumes all keys will have a unique HashCode.
> > > Although the chances of this occurring are *very* small, it
> > cannot be
> > > guaranteed. A collision would mean that the cached value for one 
> > > indexReader could be returned as cached value for another
> > indexReader
> > > (using the way this class is used in Lucene as example)
> > > 
> > > 2. This class attempts to be a complete port of the 
> > > java.util.WeakHashMap, but:
> > > - A correct implementation is difficult to implement, and 
> .Net does 
> > > not have an equivalent of java.lang.ref.ReferenceQueue to
> > accelerate
> > > the cleanup process.
> > > - Since the behaviour of the WeakHashMap is closely tied to the 
> > > behaviour of the garbage collector, it is very difficult to test 
> > > properly, if at all.
> > > - Lucene only uses the get(Object key) and put(K key, V
> > value) methods
> > > - In Lucene, the keys are IndexReader instances.  In normal
> > use cases
> > > there will be only 1 IndexReader live, and perhaps another one if 
> > > 'warming up' is used before switching IndexSearchers 
> after an index 
> > > update.
> > > - If this class is presented as a complete port of 
> > > java.util.WeakHashMap, users might assume this is
> > production quality
> > > code and copy/paste its code in their own projects.
> > > Any further bugs found might harm the credibility of the Lucene 
> > > project, even if those sections are never used in Lucene.
> > (e.g.: The
> > > collection returned by Keys might contain null values.  
> There's no 
> > > guarantee the GC won't intervene between the call to
> > Cleanup and the
> > > calls to keys.Add(w.Key.Target).
> > > 
> > > 3. Most support classes that are needed for the conversion
> > of java to
> > > .Net are in the anonymous namespace, but this one is in 
> > > Lucene.Net.Util.  I would propose to keep the namespaces
> > corresponding
> > > to the original java packages clean and either put all
> > support classes
> > > in the anonymous namespace or in a Lucene.Net specific one 
> > > (Lucene.Net.DotNetPort ?) (I see that George moved it already to 
> > > SupportClass, thanks George!)
> > > 
> > > I have been using the java version of Lucene in a project
> > for a long
> > > time.
> > > For a new project that is currently under development, we will be 
> > > using the .Net version.
> > > For now we are using the 2.1 version using a custom patch
> > to avoid the
> > > memory leak problem.
> > > 
> > > I'll post an up to date version of this patch as a
> > replacement for the
> > > current implementation of Lucene.Net.Util.WeakHashMap 
> sometime next 
> > > week...
> > > - It doesn't try to be a complete implementation of
> > > WeakHashMap: only the methods strictly necessary are
> > implemented, all
> > > other methods throw a NotImplementedException. (It should
> > probably be
> > > renamed to SimplifiedWeakHashMap or something)
> > > - It's optimized for the 'low number of keys' case.
> > > - It's simple code, but I want to test it thoroughly first.  
> > > The original patch was put directly in the FieldCacheImpl
> > code, I want
> > > to make sure I didn't make some stupid mistake while 
> converting it.
> > > 
> > > I have an account on Jira, but I don't have rights to reopen the 
> > > issue...
> > > 
> > > Regards,
> > > 
> > > Luc
> > > 
> > > 
> > > -----Original Message-----
> > > From: Digy (JIRA) [mailto:j...@apache.org]
> > > Sent: zaterdag 27 december 2008 19:47
> > > To: lucene-net-dev@incubator.apache.org
> > > Subject: [jira] Closed: () Lucene.NET (Revision: 603121) 
> is leaking 
> > > memory
> > > 
> > > 
> > >      [
> > > https://issues.apache.org/jira/browse/LUCENENET-106?page=com.a
> > tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> > > 
> > > Digy closed LUCENENET-106.
> > > --------------------------
> > > 
> > >     Resolution: Fixed
> > >       Assignee: Digy
> > > 
> > > Patches are applied.
> > > 
> > > DIGY.
> > > 
> > > > Lucene.NET (Revision: 603121) is leaking memory
> > > > -----------------------------------------------
> > > >
> > > >                 Key: LUCENENET-106
> > > >                 URL: 
> > > https://issues.apache.org/jira/browse/LUCENENET-106
> > > >             Project: Lucene.Net
> > > >          Issue Type: Bug
> > > >         Environment: .NET 2.0
> > > >            Reporter: Anton K.
> > > >            Assignee: Digy
> > > >            Priority: Critical
> > > >         Attachments: DIGY-FieldCacheImpl.patch, Digy.rar,
> > > luceneSrc_memUsage.patch, Paches for v2.3.1.rar,
> > > WeakHashTable+FieldCacheImpl.rar, WeakReferences.rar
> > > >
> > > >
> > > > readerCache Hashtable field (see FieldCacheImpl.cs) never
> > > releases some hash items that have closed IndexReader
> > object as a key. 
> > > So a lot of Term instances are never released.
> > > > Java version of Lucene uses WeakHashMap and therefore
> > > doesn't have this problem.
> > > > This bug can be reproduced only when Sort functionality
> > > used during search. 
> > > > See following link for additional information.
> > > > http://www.gossamer-threads.com/lists/lucene/java-user/55681
> > > > Steps to reproduce:
> > > > 1)Create index
> > > > 2) Modify index by IndexWiter; Close IndexWriter
> > > > 3) Use IndexSearcher for searching with Sort; Close InexSearcher
> > > > 4) Go to step 2
> > > > You'll get OutOfMemoryException after some time of running
> > > this algorithm.
> > > 
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > You can reply to this email to add a comment to the issue online.
> > > 
> > > 
> > > 
> > 
> > 
>

RE: Problems with patch for LUCENENET-106

Reply via email to