[jira] [Created] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention

Karl Wright (JIRA) Fri, 12 Apr 2013 07:06:17 -0700

Karl Wright created LUCENE-4930:
-----------------------------------

             Summary: Lucene's use of WeakHashMap at index time prevents full 
use of cores on some multi-core machines, due to contention
                 Key: LUCENE-4930
                 URL: https://issues.apache.org/jira/browse/LUCENE-4930
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
    Affects Versions: 4.2
         Environment: Dell blade system with 16 cores
            Reporter: Karl Wright



Our project is not optimally using full processing power during under indexing 
load on Lucene 4.2.0.  The reason is the AttributeSource.addAttribute() method, 
which goes through a WeakHashMap synchronizer, which is apparently 
single-threaded for a significant amount of time.  Have a look at the following 
trace:

"pool-1-thread-28" prio=10 tid=0x00007f47fc104800 nid=0x672b waiting for 
monitor entry [0x00007f47d19ed000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:98)
        - waiting to lock <0x00000005c5cd9988> (a 
java.lang.ref.ReferenceQueue$Lock)
        at org.apache.lucene.util.WeakIdentityMap.reap(WeakIdentityMap.java:189)
        at org.apache.lucene.util.WeakIdentityMap.get(WeakIdentityMap.java:82)
        at 
org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:74)
        at 
org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:65)
        at 
org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
        at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:107)
        at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
        at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1148)
        at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1129)
…

We’ve had to make significant changes to the way we were indexing in order to 
not hit this issue as much, such as indexing using TokenStreams which we reuse, 
when it would have been more convenient to index with just tokens.  (The reason 
is that Lucene internally creates TokenStream objects when you pass a token 
array to IndexableField, and doesn’t reuse them, and the addAttribute() causes 
massive contention as a result.)  However, as you can see from the trace above, 
we’re still running into contention due to other addAttribute() method calls 
that are buried deep inside Lucene.

I can see two ways forward.  Either not use WeakHashMap or use it in a more 
efficient way, or make darned sure no addAttribute() calls are done in the main 
code indexing execution path.  (I think it would be easy to fix 
DocInverterPerField in that way, FWIW.  I just don’t know what we’ll run into 
next.)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention

Reply via email to