Re: 15 minute hang in IndexInput.clone() involving finalizers

Chuck Williams Sat, 16 Dec 2006 00:07:16 -0800

The problem appears to be this.  We have an approximately 1 million item
index.  It uses 6 parallel subindexes with ParallelReader, so each of
these subindexes has 1 million items.  Each subindex has the same
segment structure, with 15 segments in each at the moment.


I mentioned before that the issue arose just after a deleteAdd update
that closed the reader after the deletes, added with the writer, and
then reopened the reader.

We have been using a default sort that looks at score first and then id
of the item.  Each id is unique, with an integer sort field.  So the
query just after the IndexReader refresh has to create a new FieldCache
comparator for this integer field.  That generates a ParallelTermDocs
that iterates the id field, which is of course in only one of the
subindexes.  So we have to build a field cache with 1,000,000 entries,
which requires cloning the freqStream in the SegmentReader for each
segment.  This  should only be 15 clones as I interpret the code.

There were 4 threads doing this simultaneously, so make that 60 clones.

I can see reading 1 million terms and building the comparator taking a
while, although not the 15-20 minutes it does, and am baffled at how
every thread dump on many trials of this issue end up with every one
inside the clone()!  The clone just doesn't do much, the most expensive
thing being copying the 1024 byte buffer in BufferedIndexInput.

Applying the patch moved the issue somewhat, but not materially.  The
setup of the FieldCache comparator still takes the same amount of time
and all thread dumps still find the stack inside Object.clone() working
on finalizers.

I'll study this further and look for an optimization, submitting a patch
if I find one.  One interesting thing is that it appears that all 4
threads simultaneously doing this query are building a field cache.  It
seems the synchronization in FieldCacheIImpl.get() with the
CreationPlaceHolder should have prevented that, but for some reason it
is not.

Any further suggestions would be welcome!

For easy access, here is the thread dump again without the patch:

> == Thread Connection thread group.HttpConnection-26493-7 ===>
> java.lang.ref.Finalizer.add(Unknown Source)
>         java.lang.ref.Finalizer.<init>(Unknown Source)
>         java.lang.ref.Finalizer.register(Unknown Source)
>         java.lang.Object.clone(Native Method)
>         org.apache.lucene.store.IndexInput.clone(IndexInput.java:175)
>        
> org.apache.lucene.store.BufferedIndexInput.clone(BufferedIndexInput.java:128)
>         org.apache.lucene.store.FSIndexInput.clone(FSDirectory.java:562)
>        
> org.apache.lucene.index.SegmentTermDocs.<init>(SegmentTermDocs.java:45)
>        
> org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:333)
>        
> org.apache.lucene.index.MultiTermDocs.termDocs(MultiReader.java:416)
>        
> org.apache.lucene.index.MultiTermDocs.termDocs(MultiReader.java:409)
>         org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:361)
>        
> org.apache.lucene.index.ParallelReader$ParallelTermDocs.next(ParallelReader.java:353)
>        
> org.apache.lucene.search.FieldCacheImpl$3.createValue(FieldCacheImpl.java:173)
>        
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
>        
> org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:154)
>        
> org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:148)
>        
> org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQueue.java:204)
>        
> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:175)
>        
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
>        
> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:155)
>        
> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:56)
>        
> org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollector.java:41)

And here is the top of the stack with the patch (rest is the same):

> == Thread Connection thread group.HttpConnection-26493-3 ===>
> java.lang.ref.Finalizer.<init>(Unknown Source)
> java.lang.ref.Finalizer.register(Unknown Source)
> java.lang.Object.clone(Native Method)
> org.apache.lucene.store.IndexInput.clone(IndexInput.java:175)
> org.apache.lucene.store.BufferedIndexInput.clone(BufferedIndexInput.java:128)
>
> org.apache.lucene.store.FSIndexInput.clone(FSDirectory.java:564)
> org.apache.lucene.index.SegmentTermDocs.<init>(SegmentTermDocs.java:45) 


Thanks,

Chuck


Chuck Williams wrote on 12/15/2006 08:22 AM:
> Yonik and Robert, thanks for the suggestions and pointer to the patch!
>
> We've looked at the synchronization involved with finalizers and don't
> see how it could cause the issue as running the finalizers themselves is
> outside the lock.  The code inside the lock is simple fixed-time list
> manipulation, not even a loop.  On the other hand, we don't see how
> anything else could cause the problem either.
>
> We'll try the patch and let everybody know if it resolves the issue. 
> Getting rid of the finalizer will hopefully at least circumvent the problem.
>
> Thanks!
>
> Chuck
>
>
> robert engels wrote on 12/15/2006 07:23 AM:
>   
>> I don't think there is necessarily a limit, but I am assuming there is
>> some interaction between the finalizer or GC threads and adding new
>> finalizers, which would require some sort of synchronization.
>>
>> Do you use network resources in your app? I have a hunch this is
>> somehow the cause.
>>
>>
>> On Dec 15, 2006, at 11:20 AM, Yonik Seeley wrote:
>>
>>     
>>> On 12/15/06, robert engels <[EMAIL PROTECTED]> wrote:
>>>       
>>>> If you could post a complete thread dump that would be better.
>>>>
>>>> I am "thinking" that it is not really a bug, but that the finalizer
>>>> or GC thread is very busy, or possibly blocked on a network resource,
>>>> and that is preventing the addition of additional finalizers.
>>>>         
>>> I didn't know there was a limit to the number of finalizers... but it
>>> certainly sounds plausible.
>>> I think we should still get rid of all the unnecessary finalizers in
>>> IndexInput though.
>>>
>>> -Yonik
>>> http://incubator.apache.org/solr Solr, the open-source Lucene search
>>> server
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 15 minute hang in IndexInput.clone() involving finalizers

Reply via email to