[jira] [Issue Comment Edited] (LUCENE-3653) Lucene Search not scalling

Uwe Schindler (Issue Comment Edited) (JIRA) Mon, 19 Dec 2011 07:29:58 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172338#comment-13172338
 ]


Uwe Schindler edited comment on LUCENE-3653 at 12/19/11 3:28 PM:
-----------------------------------------------------------------

bq. IndexSearch.doc calls RAMFile indirectly through RAMInputStream, once 
search is complete you need to know what's been found so IndexSearch.doc needs 
to be called. i.e. RAMInputStream calls RAMFile.numBuffers and getBuffer on the 
switchCurrentBuffer which happens allot during my searches.

If you use RAMDirectory on a large index its slowing down things as it drives 
the garbage collector crazy. Use an on-disk index with MMapDirectory, which has 
no locking at all (only on sometimes called IndexInput.clone, but if you remove 
that your JVM will SIGSEGV if you use Lucene incorrectly with multiple threads).

RAMDirectory is written for tests, not for production use. There are already 
plans to remove it from Lucene trunk and move to tests only. Have you seen that 
it allocates buffers in 8 Kilobytes blocks? Calculate how many byte[] you have 
on a 50 Gigabytes index... GC will drive crazy when it starts to cleanup. And 
then it stops your whole application, not because it locks inside RAMFile, 
because it does a stop-the world GC.

We are working on a RAM-Dir like approach storing the files outside Java heap 
using a large DirectByteBuffer (which is the same code as MMapDirctory). The 
problem is writing to such a directory, but reading is as fast (or even faster) 
than RAMDirectory without locks.
                
      was (Author: thetaphi):
    bq. IndexSearch.doc calls RAMFile indirectly through RAMInputStream, once 
search is complete you need to know what's been found so IndexSearch.doc needs 
to be called. i.e. RAMInputStream calls RAMFile.numBuffers and getBuffer on the 
switchCurrentBuffer which happens allot during my searches.

If you use RAMDirectory on a large index its slowing down things as it drives 
the garbage collector crazy. Use an on-disk index with MMapDirectory, which has 
no locking at all (only on sometimes called IndexInput.clone, but if you remove 
that your JVM will SIGSEGV if you use Lucene incorrectly with multiple threads).

RAMDirectory is written for tests, not for production use. There are already 
plans to remove it from Lucene trunk and move to tests only. Have you seen that 
it allocates buffers in 8 Kilobytes blocks? Calculate how many byte[] you have 
on a 50 Gigabytes index... GC will drive crazy when it starts to cleanup. And 
then it stops your whole application, not because it locks inside RAMFile, 
because it does a stop-the world GC.

We are working on a RAM-Dir like approach storing the files outside Java heap 
using a large DirectByteBuffer (which is the same code as MMapDirctory). The 
problem is writing to such a directory, but loading is as fast as MMapDir.
                  
> Lucene Search not scalling
> --------------------------
>
>                 Key: LUCENE-3653
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3653
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Gerrit Jansen van Vuuren
>         Attachments: App.java, 
> LUCENE-3653-VirtualMethod+AttributeSource.patch, 
> LUCENE-3653-VirtualMethod+AttributeSource.patch, 
> LUCENE-3653-VirtualMethod+AttributeSource.patch, LUCENE-3653-no-sync.png, 
> LUCENE-3653-sync-.png, LUCENE-3653.patch, 
> LUCENE-3653.patch-BiasedLockingStartupDelay_1.png, 
> LUCENE-3653.patch-BiasedLockingStartupDelay_2.png, 
> LUCENE-3653.patch-BiasedLockingStartupDelay_3.png, 
> Threads-LUCENE-3653.patch.png, lucene-unsync.diff, profile_1_a.png, 
> profile_1_b.png, profile_1_c.png, profile_1_d.png, profile_2_a.png, 
> profile_2_b.png, profile_2_c.png
>
>
> I've noticed that when doing thousands of searches in a single thread the 
> average time is quite low i.e. a few milliseconds. When adding more 
> concurrent searches doing exactly the same search the average time increases 
> drastically. 
> I've profiled the search classes and found that the whole of lucene blocks on 
> org.apache.lucene.index.SegmentCoreReaders.getTermsReader
> org.apache.lucene.util.VirtualMethod
>   public synchronized int getImplementationDistance 
> org.apache.lucene.util.AttributeSourcew.getAttributeInterfaces
> These cause search times to increase from a few milliseconds to up to 2 
> seconds when doing 500 concurrent searches on the same in memory index. Note: 
> That the index is not being updates at all, so not refresh methods are called 
> at any stage.
> Some questions:
>   Why do we need synchronization here?
>   There must be a non-lockable solution for these, they basically cause 
> lucene to be ok for single thread applications but disastrous for any 
> concurrent implementation.
> I'll do some experiments by removing the synchronization from the methods of 
> these classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Edited] (LUCENE-3653) Lucene Search not scalling

Reply via email to