The reason there is a ParallelMusltiSearcher is because of the reasons
given: if you are distributing your index across machines or hard
drives, doing things in parallel is fater. I don't think RAID counts.
RAID will do the parallelism for you with a single index.
I say I believe because its hard to be certain with Lucene...there are a
lot of variables depending on your setup, and frankly, a lot my
information is not hard core, but from memories of posts and work and
playing, etc. I don't want to sound more definitive than I am. On a
normal system, you will find ParallelMultiSearcher slower. If your
distributing your index over multiple machines or hard drives, you will
find it faster.
This mailing list is one of the best places for best practices:
http://www.nabble.com/Lucene-f44.html
Search it, and you will find discussions on this very topic.
The next best place to look is the FAQ at the Lucene home page. There is
a growing list of advice and best practices. Your still not going to
find a lot that is definitive. A ton depends on your particular
circumstances. There has been a lot of work done lately to make sure
Lucene runs the best it can using out of the box defaults, but this can
only go so far. Lucene is *very* configurable, and the best
configuration can depend on very specific setups. The best advice is to
try and test, try and test. If your a nut for getting things perfect,
there is an excellent benchmark system in contrib that will help your
efforts tremendously.
- Mark
Timo Nentwig wrote:
On Tuesday 01 January 2008 22:24:53 Mark Miller wrote:
I believe that, in general, you'll find that ParallelMultiSearcher is
You believe or you know? And if you know why is there a ParallelMultiSearcher
at all? :)
And I still wonder why everybody belives and finds out on his own why isn't
there are comprehensive collection of common knowledge and best practices for
lucene out there?
much slower than just using a MultiSearcher. ParralelMultiSeacher is of
use when you can put the different indexes on separate hard drives or
Well, the index might not be on different HDs but *of couse* we're talking
about multiple hard drives (at least some RAID, in my case it's some
expensive netapp, however I don't know which one exactely but I can find
out...).
even better, separate systems (using RMI or something).
- Mark
Timo Nentwig wrote:
On Tuesday 01 January 2008 21:06:06 Mark Miller wrote:
The main reason to use a single IndexReader is because its very time
consuming to open an IndexReader. If your index is pretty static, maybe
Yes, it takes quite some time to build it and it's not changed but
rebuilt from scratch.
Perhaps, in some esoteric case, multiple readers is the right idea
I recently talked to a guy how stated that they'd solved their
performance issues by breaking up the index into multiple sub-indices and
searching them in parallel (probably using ParallelMultiSearcher)...hmm,
well, I've had (and still have) my doubts but on the other hand what's
the benefit of ParallelMultiSearcher if it doesn't scale better than
searching a monolithic index?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]