Me like! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ---- > From: Michael McCandless <luc...@mikemccandless.com> > To: java-dev@lucene.apache.org > Sent: Monday, May 18, 2009 5:06:39 PM > Subject: Lucene's default settings & back compatibility > > As we all know, Lucene's back-compat policy necessarily hurts the > out-of-the-box experience for new users: because we are only allowed > make substantial improvements to Lucene's default settings at a major > release, new users won't see the improvements to our settings until a > major release (typically years apart). > > Lucene has a number of default settings, eg some recent examples: > > * Read-only IndexReader gives better much performance with threads, > yet we must now default IndexReader.open to return a non-readOnly > reader > > * We can now optionally turn off scoring when sorting by field > (sizable speed gain), but we had to leave it on by default until > 3.0 > > * Letting IndexReader.norms return null > > * LogMergePolicy now takes deletions into account, but we had to > disable it by default, since it could conceivably break back > compat. > > * Bug fixes in StandardAnalyzer must be delayed until 3.0 since > there's a remote chance they'd break back compat in an app, or we > end up adding confusing methods like "public static void > setDefaultReplaceInvalidAcronym". > > * NIOFSDirectory ought to be "the default" on UNIX, but it's not > > * Constant score rewrite ought to be the default for most multi-term > queries > > * StopFilter should enable position increments by default > > The fact that we are "forced" delay such "out of the box" improvements > to Lucene for so long is a frustrating cost, since it can only stunt > Lucene's adoption and growth and my sense is that it's a minority of > Lucene's users that need such strict back-compat (this has been > discussed before). It also clutters our APIs because we end up > creating setter/getters that often only exist for the sake of a back > compat preservation of a bug. > > I think we can fix this. Ie, maintain our strong back-compat policy, > yet still allow new users to experience the best of Lucene on every > release (not just on major releases), by creating an explicit class > that holds settings/defaults used by Lucene. > > For example, say we create a base class named Settings. It holds the > defaults for settings across all of Lucene's classes. When you create > IndexReader, IndexWriter and others, you must pass in a Settings > instance. > > A subclass, SettingsMatching24, binds all settings to "match" 2.4's > behavior. When we make improvements in 2.9, we'd add the back-compat > settings to SettingsMatching24. So if your app wants to keep exactly > 2.4's behavior, you'd pass in SettingsMatching24(). On upgrading to > 2.9 you'd still see 2.4's behavior. > > Users who'd like to see Lucene's improvements on each minor release > would instead instantiate LatestAndGreatestSettings() (or > CurrentVersionSettings(), or something), understanding that when they > upgrade there might be biggish changes to Lucene's defaults. My guess > is most users would use this settings class. > > Doug actually suggested this exact idea a while back: > > http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421. > > Now that I realize we could use this to strongly decouple "users > wanting precise back-compat" from "users wanting the latest & > greatest", I think it's a very compelling solution. > > If we do this I'd like to do it in 2.9, so that starting with 3.x we > are free to change default settings w/o breaking back compat. > > Thoughts? > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org