Just to weigh in with my opinion... the compound file format proves fine in my use of Lucene and I use it 'by default' already. So I'm +1 on making it the default behavior.

Erik


On Mar 8, 2004, at 3:25 PM, Doug Cutting wrote:


[ I moved this discussion to the developer list.]

My metric here is the rate of complaint.

I'm tired of hearing about "too many file handles" problems. Ususally it is caused by folks opening a new searcher for each query, and the garbage collector not collecting and closing the old ones fast enough, so it signals other problems with the application, but it is still annoying, and could be largely quashed.

By some definition, anything which causes so many repeated complaints is a bug, and should be fixed. Even if it's really not a bug. It pains users of Lucene. It annoys developers of Lucene.

Think of it like mergeFactor, etc.: the default setting may not be the absolute fastest, but it is one that is likely to run well in most configurations and cause the least confusion.

Doug

Terry Steichen wrote:
I tend to agree (but with the same uncertainty as to why I feel that way).
Regards,
Terry
----- Original Message ----- From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, March 08, 2004 2:34 PM
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir .... once again
I can't explain why, but I feel like the old index format should stay
by default.  I feel like I'd rather a (slightly) faster index, and
switch to the compound one when/IF I encounter problems, than have a
safer, but slower index, and never realize that there is a faster
option available.

Weak argument, I know, but some instinct in me thinks that the current
mode should remain.


Otis


--- Doug Cutting <[EMAIL PROTECTED]> wrote:


hui wrote:

Index time: compound format is 89 seconds slower.

compound format:
1389507 total milliseconds
non-compound format:
1300534 total milliseconds

The index size is 85m with 4 fields only. The files are stored in

the index.


The compound format has only 3 files and the other has 13 files.

Thanks for performing this benchmark!


It looks like the compound format is around 7% slower when indexing. To my thinking that's acceptable, given the dramatic reduction in file handles. If folks really need maximal indexing performance, then
they can explicitly disable the compound format.


Would anyone object to making compound format the default for Lucene 1.4? This is an incompatible change, but I don't think it should
break applications.


Doug

-------------------------------------------------------------------- -
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to