RE: Binding lucene instance/threads to a particular processor(or core)

Renaud Waldura Tue, 22 Apr 2008 14:59:58 -0700

That's an excellent idea. I would certainely use such an improved
MultiSearcher.
You should submit a patch.


-----Original Message-----
From: Glen Newton [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 22, 2008 10:50 AM
To: java-user@lucene.apache.org
Subject: Re: Binding lucene instance/threads to a particular processor(or
core)

So even if you only have one index, this is the way to go to manage this
kind of problem.

Looking at the implementation and having used ThreadPoolExecutor (TPE) a
lot, I would make the following suggestions for this class so as to better
support this particular use case:
Better access to the configuration of the TPE is needed: the ability to
choose the number of threads (pools size and max pool size), the type and
nature of the queue, etc. Also, the default behaviour of TPE is to throw an
exception when a job is submitted and the queue is
full: throwing an exception is expensive, especially when dozens or hundreds
of searches are being rejected and many exceptions are occurring in a high
load situation. Instead, being able to set the RejectedExecutionHandler on
the TPE would allow for a graceful handling of rejected queries (feedback to
user application, etc).
Also, TPE allows for a custom ThreadFactory, which I use to produce threads
with the highest priority to do the searching.

Right now the implementation sets the #threads and queue size as a function
of the number of Searchables, which is reasonable for this use case. But it
would generalize better if these were a function of the number of cores on
the machine, or some combination.

That said, I would suggest having adding the ability to set the
ThreadPoolExecutor completely, with a getter/setter. This would allow this
class to be useful beyond the use case of multiple indexes, becoming more
generalizable to a number of use cases including allowing it to support the
use case of one (or more indexes) and a high work load of queries that need
to be managed. It could use the same defaults if the TPE is not set
externally (is null).

-Glen

2008/4/22 Renaud Waldura <[EMAIL PROTECTED]>:
> > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
>  > number of threads and a limited queue size (use a bound 
> BlockingQueue[3])
>
>  Yes, this is precisely how the ConcurrentMultiSearcher works.
>  https://issues.apache.org/jira/browse/LUCENE-423
>
>
>
>
>  -----Original Message-----
>  From: Glen Newton [mailto:[EMAIL PROTECTED]
>  Sent: Tuesday, April 22, 2008 5:40 AM
>  To: java-user@lucene.apache.org
>  Subject: Re: Binding lucene instance/threads to a particular 
> processor(or
>  core)
>
>
>
> Anshun,
>
>  I think I am dealing with an index of similar scale: 6.4 million 
> records, 83  GB index (see [1] for more info)
>
>  I mistakenly thought from your original posting that you were 
> interested in  binding threads to processors for indexing, but it is 
> sounding like you want  to do this for searching. I am not sure if 
> this would work well for  searching, as there is a great deal more
ephemeral state. But I am not sure.
>
>  With respect to handling concurrency for search, could you describe 
> your  environment better?
>  - Is it an web-base application?
>  - What sort of problems do you have now?
>  - What are your java command line parameters (heap, etc.)
>  - What are the performance expectations, i.e. average search < N1 
> seconds;  median search time < N2 seconds
>  - Are you storing any fields and if so, do you need to store so many?
>  - Do you have the choice of serving searches from multiple machines?
>  i.e. load balance across >1 machine; We've found this to be one of 
> the best  solutions for scaling;
>  - One thing that I do for both indexing and searching is that the 
> threads  that are doing these tasks I always shift their priority to 
> MAX, so that  they are run in preference to threads doing other 
> things, like preparing  Documents, etc.
>
>  If it is a web-type environment, one solution is to set-up a  
> ThreadPoolExecutor[2] with a fixed number of threads and a limited 
> queue  size (use a bound BlockingQueue[3]) . You would have to 
> experiment with the  numbers to get the sweet-spot for your situation.
>  I would suggest starting with 2 times the number processors (cores) 
> and a  queue of say 20. Requests are queued, but if the request cannot 
> be queued,  at least the application then knows that it is too busy 
> and you can give the  user a message "The system is too busy at this 
> moment, please try again in a  few seconds..." kind of thing. The 
> advantage is also that when things are  very busy, you have the 
> accepted requests handled in a reasonable amount of  time, and some 
> users being told things are too busy, as opposed to all the  requests 
> going in and the system thrashing (and perhaps running-out of  memory at
the same time) and everyone's queries being very slow.
>
>  But I would suggest setting-up a test harness to emulate your 
> production  conditions to try these things out...
>
>  -Glen
>
>  
> [1]http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benc
> hmarks
>  .html
>  
> [2]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Thread
> PoolEx
>  ecutor.html
>  
> [3]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Blocki
> ngQueu
>  e.html
>
>
>  2008/4/22 Anshum <[EMAIL PROTECTED]>:
>  > The paper seems pretty good but I am still wondering if there was a  
> > way to  achieve this through the command line parameters. I'm just  
> > trying this to  optimize the code, if this works, would let all know  
> > else would keep  everyone informed :)  Any other suggestions for  > 
> handling a concurrency of over 7 search requests  per second for an  > 
> index size of over 15Gigs containing over 13 million  records?
>  >  Also, could someone help me with obtaining a 'index size' -  > 
> 'concurrency' -  'processor power' - 'memory' relationship formula (or  
> something similar)?
>  >
>  >  --
>  >  Anshum
>  >
>  >
>  >
>  >
>  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman 
> <[EMAIL PROTECTED]>
>  wrote:
>  >
>  >  > That paper from 1997 is pretty old, but mirrors our experiences 
> in  > those  > days. Then, we used Solaris processor sets to really 
> improve  > performance by  > binding one of our processes to a 
> particular CPU  > while leaving the other  > CPUs to manage the thread
intensive work.
>  >  >
>  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  >  >
>  >  > The Solaris thread model in the late '90s was also a significant  
> > factor in  > performance of multi-threaded programs.  The default  > 
> thread library in  > Solaris 8 implemented a MxN unbound thread model  
> > (threads/LWPS).  In those  > days we found that it did not perform  
> > well, so used the bound thread model  > (i.e. 1:1) where a Solaris  
> > thread was bound permanently to an LWP.  That  > improved 
> performance  > a lot.  In Solaris 8, Sun had what they called the  >
'alternate'
>  > thread library (T2) around 2000, which became the default  > 
> library  > in Solaris 9, and implemented a 1:1 model of Solaris threads to
> LWPs.
>  That new library had dramatic performance improvements over the old.
>  >  >
>  >  > Some background info for Java and threading  >  >  > 
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >  >
>  >  > Antony
>  >  >
>  >  >
>  >  >
>  >  > Glen Newton wrote:
>  >  >
>  >  > > I realised that not everyone on this list might be able to 
> access  > the  > > IEEE paper I pointed-out, so I will include the 
> abstract and  > some  > > paragraphs from the paper which I have included
below.
>  >  > >
>  >  > > Also of interest (and should be available to all): Fedorova et al.
>  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And  > 
> >  > Implications For Operating System Design. Usenix 2005.
>  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  >  > > "Abstract: We investigated how operating system design should 
> be  > > > adapted for multithreaded chip multiprocessors (CMT) - a new  
> > >  > generation of processors that exploit thread-level parallelism 
> to mask  > > > the memory latency in modern workloads. We  > > 
> determined that  > the L2 cache is a critical shared resource on CMT 
> and  > > that an  > insufficient amount of L2 cache can undermine the 
> ability to  > > hide  > memory latency on these processors. To use the 
> L2 cache as  > >  > efficiently as possible, we propose an 
> L2-conscious scheduling  > >  > algorithm and quantify its performance 
> potential. Using this algorithm  > > > it is possible to reduce miss 
> ratios in the L2 cache by 25-37% and  > > > improve processor throughput
by 27-45%."
>  >  > >
>  >  > >
>  >  > > From Lundberg, L. 1997:
>  >  > > Abstract: "The default scheduling algorithm in Solaris and 
> other  > > > operating systems may result in frequent relocation of 
> threads at  > > > run-time. Excessive thread relocation cause  > > 
> poor memory  > performance. This can be avoided by binding threads to  
> > >  > processors. However, binding threads to processors may result 
> in an  >  > > unbalanced load. By considering a previously obtained 
> theoretical  >  > > result and by evaluating a set of multithreaded 
> Solaris  > >  > programs using a multiprocessor with 8 processors, we 
> are able to  > >  > bound the maximum performance loss due to binding 
> threads, The  > >  > theoretical result is also recapitulated. By 
> evaluating another set of  > > > multithreaded programs, we show that 
> the gain of binding threads  > to  > > processors may be substantial, 
> particularly for programs with  > fine  > > grained parallelism."
>  >  > >
>  >  > > First paragraph: "The thread concept in Solaris [3][5] and 
> other  > > > operating systems makes it possible to write 
> multithreaded  > programs,  > > which can be executed in parallel on a
multiprocessor.
>  > Previous  > > experience from real world programs [4] show that, 
> using  > the default  > > scheduling algorithm in Solaris, threads are  
> > frequently relocated from  > > one processor  > > to another at  > 
> run-time. After each such relocation, the code and data  > >  > 
> associated with the relocated thread is moved from the cache memory of  
> > > > the 0113 processor to the cache of the new processor. This 
> reduces  > the  > > performance. Run-time relocation of threads to 
> processors can  > also  > > result in unpredictable response times. 
> This is a problem in  > systems  > > which operate in a real-time 
> environment. In order to  > avoid poor  > > memory performance and 
> unpredictable real-time  > behaviour due to  > > frequent thread 
> relocation, threads can be bound  > to processors using  > > the 
> processor-bind directive [3] [5]. The  > major problem with binding  > 
> > threads is that one can end up with an  > unbalanced load, i.e. some  
> > > processors may be extremely busy  > during some time periods while
other  > > processors are idle."
>  >  > >
>  >  > > -Glen
>  >  > >
>  >  > > On 21/04/2008, Glen Newton <[EMAIL PROTECTED]> wrote:
>  >  > >
>  >  > > > And this discussion on bound threads may also shed light on
things:
>  >  > > >
>  >  > > >
>  > 
> http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
>  > 7-11/msg02801.html
>  >  > > >
>  >  > > >
>  >  > > >  -Glen
>  >  > > >
>  >  > > >
>  >  > > >  On 21/04/2008, Glen Newton <[EMAIL PROTECTED]> wrote:
>  >  > > >  > BInding threads to processors - in many situations -  > 
> improves  > > >  >  throughput by reducing memory overhead. When a  > 
> thread is running  > > > on a  > > >  >  core, its state is local; if  
> > it is timeshared-out and either 1)  > > >  >  swapped back in on the  
> > same core, it is likely that there will be  > > >  the  > > >  >  > 
> core's L1 cache; or 2) onto another core, there will not be a  > > >  
> > cache  > > >  >  hit and the state will have to be fetched from L2 
> or  > main memory,  > > >  >  incurring a performance hit, esp in the  
> > latter. See Lundberg, L.
>  >  > > > 1997.
>  >  > > >  >  Evaluating the Performance Implications of Binding 
> Threads  > to  > > >  >  Processors. 393.
>  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  >  > > >  >  for more info.
>  >  > > >  >
>  >  > > >  >  If you are using JVM on Solaris on SPARC, you should 
> take a  > look  > > > at  > > >  >  the following links for tuning 
> (the Sun JVM  > on Solaris SPARC has  > > > many  > > >  >  more 
> performance tuning  > parameters available), including  > > > threading:
>  >  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  >  > > >  >  -
>  >  > > >
>  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >  > > >  >  -
>  >  > > >
>  > 
> http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
>  > 21107291  > > >  >  -
>  > http://java.sun.com/javase/technologies/performance.jsp
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >  -Glen
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >  On 21/04/2008, Ulf Dittmer <[EMAIL PROTECTED]> wrote:
>  >  > > >  >  > This sounds odd. Why would restricting it to a single  
> >  > > >  >  >  core improve performance? The point of using multiple  
> > >  > >  >  >  cores (and multiple threads) is to improve performance  
> > > >  > >  >  isn't it? I'd leave thread scheduling decisions to the  
> > > >  >  > >  JVM. Plus, I don't think there is anything in Java to  
> > > >  >  >  > facilitate this (short of using JNI).
>  >  > > >  >  >
>  >  > > >  >  >  Are you talking about indexing or searching? You may  
> >  > > >  >  >  be able to use multiple parallel threads to improve  > 
> > >  > >  >  indexing performance. I don't think Lucene uses  > > >  >  
> >  > multi-threading for searching; not unless you have  > > >  >  >  
> > multiple indices, anyway.
>  >  > > >  >  >
>  >  > > >  >  >  Ulf
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >  --- Anshum <[EMAIL PROTECTED]> wrote:
>  >  > > >  >  >
>  >  > > >  >  >  > Hi,
>  >  > > >  >  >  >
>  >  > > >  >  >  > I have been trying to bind my lucene instance (JVM 
> -  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular core 
> so  > as to improve the performance. Is  > > >  >  >  > there a way to 
> do so  > or  > > >  >  >  > is there support in lucene to explicitly 
> control  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
>  >  > > >  >  >  >
>  >  > > >  >  >  > --
>  >  > > >  >  >  > --
>  >  > > >  >  >  > The facts expressed here belong to everybody, the  
> > >  > >  >  >  > opinions to me.
>  >  > > >  >  >  > The distinction is yours to draw............
>  >  > > >  >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >
>  > 
> ______________________________________________________________________
>  > ______________  > > >  >  >  Be a better friend, newshound, and  > 
> > >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  >  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >
>  > 
> ---------------------------------------------------------------------
>  >  > > >  >  >  To unsubscribe, e-mail:
>  > [EMAIL PROTECTED]
>  >  > > >  >  >  For additional commands, e-mail:
>  >  > > > [EMAIL PROTECTED]  > > >  >  >  > > >  >  >  
> > >  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  > 
> > >  >  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  
> > >  > >  > >  >  > 
> ---------------------------------------------------------------------
>  >  > To unsubscribe, e-mail: [EMAIL PROTECTED]
>  >  > For additional commands, e-mail: 
> [EMAIL PROTECTED]  > >  >  >  >  >  --  >  >  > --  >  
> The facts expressed here belong to everybody, the opinions to me.
>  >  The distinction is yours to draw............
>  >
>
>
>
>  --
>
>  -
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [EMAIL PROTECTED]
>  For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Binding lucene instance/threads to a particular processor(or core)

Reply via email to