[jira] [Issue Comment Edited] (LUCENE-3653) Lucene Search not scalling

Uwe Schindler (Issue Comment Edited) (JIRA) Sat, 17 Dec 2011 18:06:02 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171754#comment-13171754
 ]


Uwe Schindler edited comment on LUCENE-3653 at 12/18/11 2:04 AM:
-----------------------------------------------------------------

bq. Creating a Single -Tokenizer-Analyzer does help, but the thread blocking 
still happens because of the synchronization used in several classes.

Not reusing is stupid because of heavy construction cost (not contention)

bq. I agree, if anybody has to decide between, concurrency or storing things 
twice then concurrency wins, eventually all the cache data will be available to 
all threads, and the overhead goes away. But with synchronization the overhead 
never goes away.

There are places where we cannot remove synchronization - and those places are 
no issue at all. Just because there is synchronization, there is not 
necessarily a bottleneck. Not everything you mention is an issue.

bq. RAMFile : all methods are synchronized.

There is contention, but will not slowdown your search. Please keep 
synchronization there. every RAMFile is only opened once and then contention is 
gone. Not everything what your profiler shows as contention is one, only the 
first query will have some minor contention.

bq. RAMInputStream: clone() This method came up during the profiling allot. I 
changed it from calling clone to: just create an new instance directly.

Thats fine, but same applies here. You only have contention on first few 
queries.

bq. I'll try to cleanup some of the code and add a better diff.

The VirtualMethod and AttributeSource is already fixed in my patch.

On the time-line of your profiler output I see no improvement in speed. How 
much faster does your code get?
                
      was (Author: thetaphi):
    bq. Creating a Single -Tokenizer-Analyzer does help, but the thread 
blocking still happens because of the synchronization used in several classes.

Not reusing is stupid because of heavy

bq. I agree, if anybody has to decide between, concurrency or storing things 
twice then concurrency wins, eventually all the cache data will be available to 
all threads, and the overhead goes away. But with synchronization the overhead 
never goes away.

There are places where we cannot remove synchronization - and those places are 
no issue at all. Just because there is synchronization, there is not 
necessarily a bottleneck. Not everything you mention is an issue.

bq. RAMFile : all methods are synchronized.

There is contention, but will not slowdown your search. Please keep 
synchronization there. every RAMFile is only opened once and then contention is 
gone. Not everything what your profiler shows as contention is one, only the 
first query will have some minor contention.

bq. RAMInputStream: clone() This method came up during the profiling allot. I 
changed it from calling clone to: just create an new instance directly.

Thats fine, but same applies here. You only have contention on first few 
queries.

bq. I'll try to cleanup some of the code and add a better diff.

The VirtualMethod and AttributeSource is already fixed in my patch.

On the time-line of your profiler output I see no improvement in speed. How 
much faster does your code get?
                  
> Lucene Search not scalling
> --------------------------
>
>                 Key: LUCENE-3653
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3653
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Gerrit Jansen van Vuuren
>         Attachments: App.java, 
> LUCENE-3653-VirtualMethod+AttributeSource.patch, 
> LUCENE-3653-VirtualMethod+AttributeSource.patch, lucene-unsync.diff, 
> profile_1_a.png, profile_1_b.png, profile_1_c.png, profile_1_d.png, 
> profile_2_a.png, profile_2_b.png, profile_2_c.png
>
>
> I've noticed that when doing thousands of searches in a single thread the 
> average time is quite low i.e. a few milliseconds. When adding more 
> concurrent searches doing exactly the same search the average time increases 
> drastically. 
> I've profiled the search classes and found that the whole of lucene blocks on 
> org.apache.lucene.index.SegmentCoreReaders.getTermsReader
> org.apache.lucene.util.VirtualMethod
>   public synchronized int getImplementationDistance 
> org.apache.lucene.util.AttributeSourcew.getAttributeInterfaces
> These cause search times to increase from a few milliseconds to up to 2 
> seconds when doing 500 concurrent searches on the same in memory index. Note: 
> That the index is not being updates at all, so not refresh methods are called 
> at any stage.
> Some questions:
>   Why do we need synchronization here?
>   There must be a non-lockable solution for these, they basically cause 
> lucene to be ok for single thread applications but disastrous for any 
> concurrent implementation.
> I'll do some experiments by removing the synchronization from the methods of 
> these classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Edited] (LUCENE-3653) Lucene Search not scalling

Reply via email to