Re: CachingWrapperFilter: why cache per IndexReader?

Mark Miller Tue, 01 Jan 2008 13:53:04 -0800

The reason there is a ParallelMusltiSearcher is because of the reasonsgiven: if you are distributing your index across machines or harddrives, doing things in parallel is fater. I don't think RAID counts.RAID will do the parallelism for you with a single index.

I say I believe because its hard to be certain with Lucene...there are alot of variables depending on your setup, and frankly, a lot myinformation is not hard core, but from memories of posts and work andplaying, etc. I don't want to sound more definitive than I am. On anormal system, you will find ParallelMultiSearcher slower. If yourdistributing your index over multiple machines or hard drives, you willfind it faster.

This mailing list is one of the best places for best practices:http://www.nabble.com/Lucene-f44.html


Search it, and you will find discussions on this very topic.

The next best place to look is the FAQ at the Lucene home page. There isa growing list of advice and best practices. Your still not going tofind a lot that is definitive. A ton depends on your particularcircumstances. There has been a lot of work done lately to make sureLucene runs the best it can using out of the box defaults, but this canonly go so far. Lucene is *very* configurable, and the bestconfiguration can depend on very specific setups. The best advice is totry and test, try and test. If your a nut for getting things perfect,there is an excellent benchmark system in contrib that will help yourefforts tremendously.


- Mark



Timo Nentwig wrote:

On Tuesday 01 January 2008 22:24:53 Mark Miller wrote:

I believe that, in general, you'll find that ParallelMultiSearcher is

You believe or you know? And if you know why is there a ParallelMultiSearcherat all? :)

And I still wonder why everybody belives and finds out on his own why isn'tthere are comprehensive collection of common knowledge and best practices forlucene out there?

much slower than just using a MultiSearcher. ParralelMultiSeacher is of
use when you can put the different indexes on separate hard drives or

Well, the index might not be on different HDs but *of couse* we're talkingabout multiple hard drives (at least some RAID, in my case it's someexpensive netapp, however I don't know which one exactely but I can findout...).

even better, separate systems (using RMI or something).

- Mark

Timo Nentwig wrote:

On Tuesday 01 January 2008 21:06:06 Mark Miller wrote:

The main reason to use a single IndexReader is because its very time
consuming to open an IndexReader. If your index is pretty static, maybe

Yes, it takes quite some time to build it and it's not changed but
rebuilt from scratch.

Perhaps, in some esoteric case, multiple readers is the right idea

I recently talked to a guy how stated that they'd solved their
performance issues by breaking up the index into multiple sub-indices and
searching them in parallel (probably using ParallelMultiSearcher)...hmm,
well, I've had (and still have) my doubts but on the other hand what's
the benefit of ParallelMultiSearcher if it doesn't scale better than
searching a monolithic index?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: CachingWrapperFilter: why cache per IndexReader?

Reply via email to