[jira] [Commented] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention

Christian Ziech (JIRA) Fri, 12 Apr 2013 10:04:18 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630329#comment-13630329
 ]


Christian Ziech commented on LUCENE-4930:
-----------------------------------------

I have checked the java we are using and it is the lastest java 6 openjdk  
available for ubuntu 12.04 LTS (6b24-1.11.5-0ubuntu1~12.04.1). 
There is a newer one available for 12.10 (6b27-1.12.3-0ubuntu1~12.10.1) but 
still the issue is also not fixed in that version ... so there is no way around 
that code without upgrading to java 7 ...

The code snippet from the reference queue looks like follows:
{noformat}
     89     /**
     90      * Polls this queue to see if a reference object is available.  If 
one is
     91      * available without further delay then it is removed from the 
queue and
     92      * returned.  Otherwise this method immediately returns 
<tt>null</tt>.
     93      *
     94      * @return  A reference object, if one was immediately available,
     95      *          otherwise <code>null</code>
     96      */
     97     public Reference<? extends T> poll() {
     98         synchronized (lock) {
     99             return reallyPoll();
    100         }
    101     }
{noformat}

So it seems that Uwe is right about our java version not doing the double 
checking here before actually entering the synchronized block.

However I'm not really sure if I understand the reason for lucene to actually 
use a WeakKeyHashMap here:
I may be wrong but wouldn't that reap actually only happen when the Interface 
class itself is unloaded? That should be an extremely rare thing, or? If I 
understand the purpose of that code correctly it is meant to prevent a memory 
wasting for cases where the user does incremental indexing from time to time. 
In that case the attribute source would prevent the interface class and 
implementation class from being garbage collected in the mean time. But is that 
case actually really worth the effort (I don't know how big the memory 
footprint for an Attribute implementation _class_ usually is)? I mean that 
would only affect the static fields here (and in plain lucene I could not find 
many of those) ...


                
> Lucene's use of WeakHashMap at index time prevents full use of cores on some 
> multi-core machines, due to contention
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4930
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4930
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.2
>         Environment: Dell blade system with 16 cores
>            Reporter: Karl Wright
>         Attachments: thread_dump.txt
>
>
> Our project is not optimally using full processing power during under 
> indexing load on Lucene 4.2.0.  The reason is the 
> AttributeSource.addAttribute() method, which goes through a WeakHashMap 
> synchronizer, which is apparently single-threaded for a significant amount of 
> time.  Have a look at the following trace:
> "pool-1-thread-28" prio=10 tid=0x00007f47fc104800 nid=0x672b waiting for 
> monitor entry [0x00007f47d19ed000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:98)
>         - waiting to lock <0x00000005c5cd9988> (a 
> java.lang.ref.ReferenceQueue$Lock)
>         at 
> org.apache.lucene.util.WeakIdentityMap.reap(WeakIdentityMap.java:189)
>         at org.apache.lucene.util.WeakIdentityMap.get(WeakIdentityMap.java:82)
>         at 
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:74)
>         at 
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:65)
>         at 
> org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
>         at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:107)
>         at 
> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
>         at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
>         at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
>         at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1148)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1129)
> …
> We’ve had to make significant changes to the way we were indexing in order to 
> not hit this issue as much, such as indexing using TokenStreams which we 
> reuse, when it would have been more convenient to index with just tokens.  
> (The reason is that Lucene internally creates TokenStream objects when you 
> pass a token array to IndexableField, and doesn’t reuse them, and the 
> addAttribute() causes massive contention as a result.)  However, as you can 
> see from the trace above, we’re still running into contention due to other 
> addAttribute() method calls that are buried deep inside Lucene.
> I can see two ways forward.  Either not use WeakHashMap or use it in a more 
> efficient way, or make darned sure no addAttribute() calls are done in the 
> main code indexing execution path.  (I think it would be easy to fix 
> DocInverterPerField in that way, FWIW.  I just don’t know what we’ll run into 
> next.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention

Reply via email to