Not me, but it looks useful and something I could actually use (exactly to look 
at synchronization bottlenecks in situations where many threads are sharing a 
single IndexSearcher).  Unfortunately, it looks like it works only with IBM's 
JVM: "Any platform running an IBM®-supplied Java™ SDK or JRE, Version 5.0 or 
above." :(

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Mark Miller <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Sunday, January 20, 2008 9:46:25 AM
Subject: Re: Multiple searchers (Was: CachingWrapperFilter: why cache per 
IndexReader?)

Anyone 
tried 
using 
this 
on 
Lucene 
yet? 
http://www.alphaworks.ibm.com/tech/jla

Michael 
McCandless 
wrote:
>
> 
These 
results 
are 
very 
interesting.  
With 
3 
threads 
on 
SSD 
your 
> 
searches 
run 
87% 
faster 
if 
you 
use 
3 
IndexSearchers 
instead 
of 
sharing 
> 
a 
single 
one.
>
> 
This 
means, 
for 
your 
test, 
there 
are 
some 
crazy 
synchronization 
> 
bottlenecks 
when 
searching, 
which 
I 
think 
we 
should 
ferret 
out 
and 
fix.
>
> 
Have 
you 
done 
any 
profiling 
to 
understand 
where 
the 
threads 
are 
> 
waiting 
when 
you 
share 
one 
IndexSearcher?  
EG 
YourKit 
can 
tell 
you 
> 
where 
the 
threads 
are 
waiting...
>
> 
I 
know 
there 
is 
synchronization 
used 
when 
reading 
bytes 
from 
the 
> 
underlying 
file 
descriptor.  
We've 
investigated 
options 
to 
remove 
that 
> 
(https://issues.apache.org/jira/browse/LUCENE-753) 
but 
those 
options 
> 
seemed 
to 
hurt 
single 
threaded 
performance.  
I 
wonder 
if 
the 
patch 
on 
> 
that 
issue 
closes 
some 
of 
this 
87% 
performance 
loss?
>
> 
Does 
anyone 
know 
of 
other 
synchronization 
bottlenecks 
in 
searching?
>
> 
Mike
>
> 
Otis 
Gospodnetic 
wrote:
>
>> 
This 
is 
great 
and 
valuable 
information, 
Toke(n)!
>> 
Just 
the 
other 
day 
we 
recommended 
this 
multi-IndexSearcher 
to 
>> 
somebody 
concerned 
with 
low 
QPS 
rates 
their 
benchmarks 
revealed.  
>> 
They 
were 
hitting 
their 
index 
with 
a 
good 
number 
of 
threads 
and 
>> 
hitting 
synchronized 
blocks 
in 
Lucene.  
Multiple 
searchers 
is 
one 
way 
>> 
around 
that.  
Also, 
your 
sweet 
spot 
of 
3 
makes 
sense 
- 
keeps 
all 
of 
>> 
your 
cores 
fully 
busy.
>>
>> 
You 
are 
our 
main 
SSD 
info 
supplier 
-- 
keep 
it 
coming! 
:)  
And 
let 
us 
>> 
know 
what 
numbers 
you 
get 
for 
2.2 
and 
2.3, 
please.
>>
>> 
Thanks,
>> 
Otis
>>
>> 
-- 
>> 
Sematext 
-- 
http://sematext.com/ 
-- 
Lucene 
- 
Solr 
- 
Nutch
>>
>> 
----- 
Original 
Message 
----
>> 
From: 
Toke 
Eskildsen 
<[EMAIL PROTECTED]>
>> 
To: 
java-user@lucene.apache.org
>> 
Sent: 
Thursday, 
January 
17, 
2008 
5:31:56 
AM
>> 
Subject: 
Multiple 
searchers 
(Was: 
CachingWrapperFilter: 
why 
cache 
per 
>> 
IndexReader?)
>>
>> 
On 
Fri, 
2008-01-11 
at 
11:34 
+0100, 
Toke 
Eskildsen 
wrote:
>>> 
As 
for 
shared 
searcher 
vs. 
individual 
searchers, 
there 
was 
just 
a
>>> 
slight 
penalty 
for 
using 
individual 
searchers.
>>
>> 
Whoops! 
Seems 
like 
I 
need 
better 
QA 
for 
my 
test-code. 
I 
didn't 
use
>> 
individual 
searchers 
for 
each 
thread 
when 
I 
thought 
I 
was. 
The 
slight
>> 
penalty 
wrongly 
observed 
must 
have 
been 
due 
to 
measurement 
variations.
>>
>> 
With 
the 
corrected 
test, 
some 
interesting 
observations 
about 
our 
index
>> 
can 
be 
made, 
which 
will 
definitely 
affect 
our 
configuration. 
In 
the
>> 
following, 
the 
queries/second 
is 
an 
average 
over 
350.000 
queries.
>> 
For 
each 
query, 
a 
search 
is 
performed 
and 
the 
content 
of 
a 
specific
>> 
field 
is 
extracted 
for 
the 
first 
20 
hits.
>>
>> 
== 
System-summary 
==
>> 
Dual-core 
Intel 
Xeon 
5148 
2.3 
GHz, 
8 
GB 
RAM, 
Linux, 
Lucene 
2.1, 
37
>>  
GB/10
>> 
million 
documents 
index, 
queries 
taken 
from 
production 
system 
logs.
>>
>> 
== 
Conventional 
harddisks 
(2 
* 
15000 
RPM 
in 
software 
RAID 
1) 
==
>> 
1 
thread,  
1 
searcher:  
109 
queries/sec
>> 
2 
threads, 
1 
searcher:  
118 
queries/sec
>> 
2 
threads, 
2 
searchers: 
157 
queries/sec
>> 
3 
threads, 
1 
searcher:  
111 
queries/sec
>> 
3 
threads, 
3 
searchers: 
177 
queries/sec
>> 
4 
threads, 
1 
searcher:  
108 
queries/sec
>> 
4 
threads, 
4 
searchers: 
169 
queries/sec
>>
>> 
== 
Solid 
State 
Drives 
(2 
* 
32 
GB 
Samsung 
in 
software 
RAID 
0) 
==
>> 
1 
thread,  
1 
searcher:  
193 
queries/sec
>> 
2 
threads, 
1 
searcher:  
295 
queries/sec
>> 
2 
threads, 
2 
searchers: 
357 
queries/sec
>> 
3 
threads, 
1 
searcher:  
197 
queries/sec
>> 
3 
threads, 
3 
searchers: 
369 
queries/sec
>> 
4 
threads, 
1 
searcher:  
192 
queries/sec
>> 
4 
threads, 
4 
searchers: 
302 
queries/sec
>>
>> 
Graphs 
can 
be 
viewed 
at 
http://wiki.statsbiblioteket.dk/summa/Hardware
>>
>> 
For 
our 
setup 
it 
seems 
that 
the 
usual 
avoid-multiple-searchers 
advice
>>  
is
>> 
not 
valid, 
neither 
for 
conventional 
harddisks, 
nor 
Solid 
State 
Drives.
>> 
The 
optimal 
configuration 
for 
our 
dual-core 
test 
machine 
is 
three
>> 
threads 
with 
individual 
searchers. 
The 
obvious 
question 
is 
whether 
this
>> 
can 
be 
extended 
to 
other 
cases.
>>
>>> 
As 
for 
threading, 
I 
noticed 
something 
strange: 
On 
the 
dual-core
>>> 
machine, 
two 
threads 
gave 
better 
performance 
than 
one, 
while 
4
>>  
threads
>>> 
gave 
the 
same 
performance 
as 
one.
>>
>> 
As 
can 
be 
seen 
above, 
this 
strange 
picture 
is 
consistent. 
1, 
3 
and 
4
>> 
threads 
with 
shared 
searcher 
performs 
the 
same, 
independent 
of 
which
>> 
storage 
the 
machine 
uses, 
while 
2 
threads 
performs 
markedly 
better.
>>
>> 
I've 
started 
the 
same 
test-suite 
for 
Lucene 
2.2 
and 
2.3RC2. 
It 
should
>> 
be 
finished 
in 
a 
day 
or 
two.
>>
>>
>> 
---------------------------------------------------------------------
>> 
To 
unsubscribe, 
e-mail: 
[EMAIL PROTECTED]
>> 
For 
additional 
commands, 
e-mail: 
[EMAIL PROTECTED]
>>
>>
>>
>>
>>
>> 
---------------------------------------------------------------------
>> 
To 
unsubscribe, 
e-mail: 
[EMAIL PROTECTED]
>> 
For 
additional 
commands, 
e-mail: 
[EMAIL PROTECTED]
>>
>
>
> 
---------------------------------------------------------------------
> 
To 
unsubscribe, 
e-mail: 
[EMAIL PROTECTED]
> 
For 
additional 
commands, 
e-mail: 
[EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To 
unsubscribe, 
e-mail: 
[EMAIL PROTECTED]
For 
additional 
commands, 
e-mail: 
[EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to