[ 
https://issues.apache.org/jira/browse/LUCENE-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567924#comment-13567924
 ] 

Uwe Schindler commented on LUCENE-4740:
---------------------------------------

bq. Yes, that change is probably not a good general solution, but it worked 
well for our usecase. It might be nice to have support for unloadable classes 
optional.

As I said, a change in AttributeSource or VirtualMethod is not needed, the 
number of total per-JVM references there are in the number of 10s. This is 
perfectly fine code and nobody needs to change anything. No need for "optional" 
class unloading. *Not* using weak references here would be a major design issue 
and a large leak.

bq. In any case, if the useUnmap is false, then it seems unnecessary to even 
add references to the clones to the map.

Robert and me were discussing about that already, we can do that, this patch is 
easy. We can offer that as an option (the no-unmap option), with the backside 
of e.g. windows can no longer delete index files unless they are garbage 
collected and especially higher disk usage while indexing.

I did some testing with various JDKs on windows 64 bit, using a loop that 
clones one indexinput over and over. This loop runs successful for hours 
without OOM, so there is no cleanup problem, ReferenceQueues are working 
correctly. With a heap size of 512 MB and this simple loop, the number of Weak 
references is between 5000 and 600,000. But indeed, there are some GC pauses 
(in JDK 6 and 7). The reason for this is: Weak referees are a little bit more 
"reachable" than unreachable objects, so GC let them survive for a longer time 
than unreachable ones. There is nothing we can do against that. The main 
problem in your case maybe the really large heap size: why do you need it?

My second test was to close every cloned index input (trunk/4.x only, where the 
commit you mentioned was added by me one week ago), in that case the number of 
references was of course a static "1" :-) In this test, no GC pauses occurred 
and the test ran faster.

In my final test I disabled the put() to the WeakIdentityMap completely, in 
that case it was again faster, but this was caused more by the complete 
non-existence of any locking or maintenance of the ConcurrentHashMap.

The times for 300 million clones:
- With default Lucene 4.x/trunk, no close of clones _(Lucene never closes 
clones and thats almost impossible to add)_: 200 secs, GC pauses
- With closing clones: 65 secs
- Without any map: 40 secs

(JDK 6u32, windows, 64 bit, server vm, default garbage collector)

{code:java}
  // for this test, make the clones map in ByteBufferIndexInput 
public/package-private/...
  public void testGC() throws Exception {
    MMapDirectory mmapDir = new MMapDirectory(_TestUtil.getTempDir("testGC"));
    IndexOutput io = mmapDir.createOutput("bytes", newIOContext(random()));
    io.writeVInt(5);
    io.close();
    IndexInput ii = mmapDir.openInput("bytes", IOContext.DEFAULT);
    int hash = 0;
    for (int i = 0; i < 300*1024*1024; i++) {
      final IndexInput clone = ii.clone();
      hash += System.identityHashCode(clone);
      if (i % (10*1024) == 0) {
        System.out.println("Number of clones: " + ((ByteBufferIndexInput) 
ii).clones.size());
      }
      //clone.close();
    }
    ii.close();
    mmapDir.close();
  }
{code}

In any case, we can allow user to disable unmap, but we then have to keep the 
weak references to the clones when unmapping is enabled, unless we add close() 
of clones to Lucene everywhere...

Some other ideas are: Reuse the ByteBufferIndexInput instances, so we dont need 
to recreate them all the time. I have no idea how to do that, because we have 
no close() to release those, which brings us back to that problem again.
                
> Weak references cause extreme GC churn
> --------------------------------------
>
>                 Key: LUCENE-4740
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4740
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>    Affects Versions: 3.6.1
>         Environment: Linux debian squeeze 64 bit, Oracle JDK 6, 32 GB RAM, 16 
> cores
>            Reporter: Kristofer Karlsson
>            Priority: Critical
>
> We are running a set of independent search machines, running our custom 
> software using lucene as a search library. We recently upgraded from lucene 
> 3.0.3 to 3.6.1 and noticed a severe degradation of performance.
> After doing some heap dump digging, it turns out the process is stalling 
> because it's spending so much time in GC. We noticed about 212 million 
> WeakReference, originating from WeakIdentityMap, originating from 
> MMapIndexInput.
> Our problem completely went away after removing the clones weakhashmap from 
> MMapIndexInput, and as a side-effect, disabling support for explictly 
> unmapping the mmapped data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to