+1. this would be great! Michael
On May 18, 2009, at 2:06 PM, Michael McCandless <luc...@mikemccandless.com > wrote:
As we all know, Lucene's back-compat policy necessarily hurts the out-of-the-box experience for new users: because we are only allowed make substantial improvements to Lucene's default settings at a major release, new users won't see the improvements to our settings until a major release (typically years apart). Lucene has a number of default settings, eg some recent examples: * Read-only IndexReader gives better much performance with threads, yet we must now default IndexReader.open to return a non-readOnly reader * We can now optionally turn off scoring when sorting by field (sizable speed gain), but we had to leave it on by default until 3.0 * Letting IndexReader.norms return null * LogMergePolicy now takes deletions into account, but we had to disable it by default, since it could conceivably break back compat. * Bug fixes in StandardAnalyzer must be delayed until 3.0 since there's a remote chance they'd break back compat in an app, or we end up adding confusing methods like "public static void setDefaultReplaceInvalidAcronym". * NIOFSDirectory ought to be "the default" on UNIX, but it's not * Constant score rewrite ought to be the default for most multi-term queries * StopFilter should enable position increments by default The fact that we are "forced" delay such "out of the box" improvements to Lucene for so long is a frustrating cost, since it can only stunt Lucene's adoption and growth and my sense is that it's a minority of Lucene's users that need such strict back-compat (this has been discussed before). It also clutters our APIs because we end up creating setter/getters that often only exist for the sake of a back compat preservation of a bug. I think we can fix this. Ie, maintain our strong back-compat policy, yet still allow new users to experience the best of Lucene on every release (not just on major releases), by creating an explicit class that holds settings/defaults used by Lucene. For example, say we create a base class named Settings. It holds the defaults for settings across all of Lucene's classes. When you create IndexReader, IndexWriter and others, you must pass in a Settings instance. A subclass, SettingsMatching24, binds all settings to "match" 2.4's behavior. When we make improvements in 2.9, we'd add the back-compat settings to SettingsMatching24. So if your app wants to keep exactly 2.4's behavior, you'd pass in SettingsMatching24(). On upgrading to 2.9 you'd still see 2.4's behavior. Users who'd like to see Lucene's improvements on each minor release would instead instantiate LatestAndGreatestSettings() (or CurrentVersionSettings(), or something), understanding that when they upgrade there might be biggish changes to Lucene's defaults. My guess is most users would use this settings class. Doug actually suggested this exact idea a while back: http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421. Now that I realize we could use this to strongly decouple "users wanting precise back-compat" from "users wanting the latest & greatest", I think it's a very compelling solution. If we do this I'd like to do it in 2.9, so that starting with 3.x we are free to change default settings w/o breaking back compat. Thoughts? Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
--------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org