Planning for the future: Lucy version 2 = "Lucy2"?

Marvin Humphrey Mon, 25 May 2009 13:28:46 -0700

Greets,

I've previously asserted that "there should never be a Lucy version 2", using
the example of CPAN/Perl as a target that does not support sane versioning --
CPAN modules that break backwards compatibility trigger instantaeous failure 
in live apps as soon as the update is installed.


However, this problem aflicts not just Perl and CPAN, but any dynamically
loaded shared library.  Statically linked compiled apps are spared, because
you get to fix problems at compile time while the live app carries on, then
swap in the new app when you've finished troubleshooting.  But any time live
apps have symbols being resolved at run time, upgrading the shared library
becomes problematic.

Giving the new version of the shared library a distinct filename helps, but
isn't enough.  "liblucy.dylib" and "liblucy.2.dylib" may be able to coexist on
some systems, but if a process loads two shared libraries which both contain
the symbol 'lucy_Indexer_new', the conflict resolution process is platform
dependent.  We can't count on sane behavior.

If we're going to try very hard to avoid major breaks, though, what are our
options?  There was a long thread on java-dev last week on the topic of
backwards compatibility:

  
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200905.mbox/%[email protected]%3e

In it, "settings" objects, and "actsAsVersion" variables were discussed as
ways to offer Java Lucene users the fruits of innovation without major version
breaks, when the innovations would otherwise cause backwards compat problems.
I've long thought that we would eventually need something like that for Lucy.
There are precedents: HTML::Parser on CPAN takes an "api_version" argument to
its constructor, very similar to what Mike McCandless was proposing with
"actsAsVersion" constructor arguments.

However, it became clear over the course of the java-dev discussion that
versioning constructor args are an unwieldy solution for large libraries.  Of
course they are an annoyance to the user, who must tediously inform each object
what version they are.  But beyond that, propagating versioning information to
satellite classes adds a challenging and error-prone constraint to the OO
hierarchy design requirements, violating the "divide and conquer" principle
where classes know as little as possible about each other.  We don't need to
make the OO design task any more difficult than it is already.

So, what options are left?  If we're going to avoid back compat breaks, and
the versioning constructor argument approach is not going to work for Lucy, we
need a rethink.

Maybe it's time to consider an option which I'd originally dismissed: fork
into a new namespace with every release, i.e. put Lucy version 2 into the
namespace "Lucy2".  All full-length C symbols would get prepended by variants
of "Lucy2", e.g.  "lucy2_Indexer_new".  Different versions of Lucy could
actually be loaded at the same time under this model, so the system-specific
equivalents of liblucy.dylib and liblucy.2.dylib would have no problem
co-existing.  The CPAN distro for Lucy 2.0 would be released into the Perl
namespace "Lucy2".  And so on.

Appending version numbers to names isn't an ideal solution.  They tend to
slide off: if I release "KinoSearch2", I'll wager that "KinoSearch" will be
the most common pronunciation, a very common misspelling and a typing
annoyance.  And I absolutely detest appended version numbers for command-line
apps like "bzip2" and "svn2".  

But at least Lucy's a library, not a command-line app, so misspelling "Lucy2"
as "Lucy" after the initial import will get you a missing-symbol error rather
than incorrect behavior.  Furthermore, "Lucy" is short, ends with a vowel
sound, and has an agreeable accent pattern -- so "Lucy2", "Lucy3", etc. roll
off of the tongue pretty easily.  

(On the other hand, while "LucyX" is fine, "Lucy2X" and "LucyX2" don't look
that good to me.  "LX2"?  "Lucyx2"?  "XLucy2"?  Dunno.)

Over the last couple days I've come to think that forking through "Lucy2",
"Lucy3", etc. might be our least worst option, because despite the downsides,
it allows us to adopt better development processes.  We can shorten the alpha
phase ane get to version 1.0 more quickly, because we're not going to be stuck
on 1.0 forever.  Then, once we get to 1.0, we don't need to resort to extreme
measures to extend its lifespan.

Marvin Humphrey

Planning for the future: Lucy version 2 = "Lucy2"?

Reply via email to