On Wed, Apr 14, 2010 at 12:06 AM, Marvin Humphrey <mar...@rectangular.com>wrote:

> New class names would work, too.
>
> I only mention that for the sake of completeness, though -- it's not a
> suggestion.
>

Right, to me this is just as bad.
In my eyes, the Version thing really shows the problem with the analysis
stuff:
* Used by QueryParsers, etc at search and index time, with no real clean way
to do back-compat
* Concepts like Version and class-naming push some of the burden to the
user: users decide the back-compat level, but it still leaves devs with
back-compat management hassle.

The idea of having a real versioned-module is the same as Version and
class-naming, except it both pushes the burden to the user in a more natural
way (people are used to versioned jar files and things like that... not
Version constants), and it relieves devs of the back compat nightmare.

In all honesty with the current scheme, release schedules of Lucene, and
Lucene's policy, the analysis stuff will soon deadlock into being nearly
unmaintainable, and to many users, the API is already unconsumable: its
difficult to write reusable analyzers due to historical relics in the API,
methods are named inappropriately, e.g. Tokenizer.reset(Reader) and
TokenStream.reset(), they don't understand Version, and probably a few other
things I am forgetting that are basically impossible to fix right now with
the current state of affairs.


> I'm a little concerned about the issue DM Smith brought up: what happens
> when
> you have separate applications within the same JVM which have built indexes
> using separate versions of an Analyzer?


> That use case is supported under the current regime, but I'm not sure
> whether
> it would be with aggressively versioned Analyzer packages.  If it's not,
> under
> what circumstances does that matter?
>

I think this is an advanced use case. No offense to DM, but for every
advanced use-case on java-dev like him, there are 100 people on java-user
that don't have to juggle independently versioned indexes with different
Analyzer versions within the same JVM. I think we should look at back-compat
reasonably, and at the end of the day, its an open source project, so if
theres some extreme advanced use case someone can do a few eclipse renames
themselves.


> Well, for Lucy, I think we may have addressed this problem with the new
> back
> compat policy we're auditioning with KS:
>
>    KinoSearch spins off stable forks into new namespaces periodically. As
> of
>    this release, the latest is "KinoSearch1", forked from version 0.165.
>    Users who require strong backwards compatibility should use a stable
> fork.
>
>    The main namespace, "KinoSearch", is an unstable development branch (as
>    hinted at by its version number). Superficial API changes are frequent.
>    Hard file format compatibility breaks which require reindexing are rare,
>    as we generally try to provide continuity across multiple releases, but
>    they happen every once in a while.
>

This is a whole lot larger issue (the concept of stable or release forks and
having a trunk that allows for quicker development) and its definitely
interesting. We spend a lot of time on backwards compatibility, but to take
advantage of many new features (for example, faster speed with release 3.1
rather than using flex-emulation APIs) you need to reindex anyway. I just
think analysis is really the worst-case, with not many other mechanisms for
back-compat, so its especially nasty.

Hmm, I suppose that doesn't work with the convention that the only
> difference
> between Lucene X.9 and Lucene Y.0 is the removal of deprecations.  But if
> anything is crying out for a rethink in the Lucene back compat policy, IMO
> that's it: make major version breaks act like major version breaks and
> change
> stuff that needs changin'.
>

This brings up a great point, its very unnatural for release 3.0 to be
almost a no-op and for release 3.1 to provide a new default index format and
support for customizing how the index is stored. And now we are looking at
providing flexibility in scoring that will hopefully redefine lucene from
being a vector-space search engine library to something much more flexible?
This is a minor release?!

I definitely think we should rethink things.

-- 
Robert Muir
rcm...@gmail.com

Reply via email to