On Wed, Apr 14, 2010 at 12:06 AM, Marvin Humphrey <mar...@rectangular.com>wrote:
> New class names would work, too. > > I only mention that for the sake of completeness, though -- it's not a > suggestion. > Right, to me this is just as bad. In my eyes, the Version thing really shows the problem with the analysis stuff: * Used by QueryParsers, etc at search and index time, with no real clean way to do back-compat * Concepts like Version and class-naming push some of the burden to the user: users decide the back-compat level, but it still leaves devs with back-compat management hassle. The idea of having a real versioned-module is the same as Version and class-naming, except it both pushes the burden to the user in a more natural way (people are used to versioned jar files and things like that... not Version constants), and it relieves devs of the back compat nightmare. In all honesty with the current scheme, release schedules of Lucene, and Lucene's policy, the analysis stuff will soon deadlock into being nearly unmaintainable, and to many users, the API is already unconsumable: its difficult to write reusable analyzers due to historical relics in the API, methods are named inappropriately, e.g. Tokenizer.reset(Reader) and TokenStream.reset(), they don't understand Version, and probably a few other things I am forgetting that are basically impossible to fix right now with the current state of affairs. > I'm a little concerned about the issue DM Smith brought up: what happens > when > you have separate applications within the same JVM which have built indexes > using separate versions of an Analyzer? > That use case is supported under the current regime, but I'm not sure > whether > it would be with aggressively versioned Analyzer packages. If it's not, > under > what circumstances does that matter? > I think this is an advanced use case. No offense to DM, but for every advanced use-case on java-dev like him, there are 100 people on java-user that don't have to juggle independently versioned indexes with different Analyzer versions within the same JVM. I think we should look at back-compat reasonably, and at the end of the day, its an open source project, so if theres some extreme advanced use case someone can do a few eclipse renames themselves. > Well, for Lucy, I think we may have addressed this problem with the new > back > compat policy we're auditioning with KS: > > KinoSearch spins off stable forks into new namespaces periodically. As > of > this release, the latest is "KinoSearch1", forked from version 0.165. > Users who require strong backwards compatibility should use a stable > fork. > > The main namespace, "KinoSearch", is an unstable development branch (as > hinted at by its version number). Superficial API changes are frequent. > Hard file format compatibility breaks which require reindexing are rare, > as we generally try to provide continuity across multiple releases, but > they happen every once in a while. > This is a whole lot larger issue (the concept of stable or release forks and having a trunk that allows for quicker development) and its definitely interesting. We spend a lot of time on backwards compatibility, but to take advantage of many new features (for example, faster speed with release 3.1 rather than using flex-emulation APIs) you need to reindex anyway. I just think analysis is really the worst-case, with not many other mechanisms for back-compat, so its especially nasty. Hmm, I suppose that doesn't work with the convention that the only > difference > between Lucene X.9 and Lucene Y.0 is the removal of deprecations. But if > anything is crying out for a rethink in the Lucene back compat policy, IMO > that's it: make major version breaks act like major version breaks and > change > stuff that needs changin'. > This brings up a great point, its very unnatural for release 3.0 to be almost a no-op and for release 3.1 to provide a new default index format and support for customizing how the index is stored. And now we are looking at providing flexibility in scoring that will hopefully redefine lucene from being a vector-space search engine library to something much more flexible? This is a minor release?! I definitely think we should rethink things. -- Robert Muir rcm...@gmail.com