On Mar 6, 2009, at 4:58 PM, Marvin Humphrey wrote:
Grant,
I am currently employed by Eventful, Inc, in San Diego, CA. They
are paying
me to work full-time on KinoSearch and Lucy.
I went out of my way when we negotiated the terms of my employment
to ensure
that there was no way my contract could hamper or compromise
progress towards
Lucy. The actual document is confidential of course, but I feel
comfortable
saying that first, our lawyers hammered out the legal nuts and bolts
to my
satisfaction, and second, Eventful is fully on board with regards to
Lucy. By
way of illustration, my boss regularly hassles me about publishing a
Lucy C
API, even though since Eventful uses the Perl bindings the benefits
would be
indirect.
This just further underscores my point. Lucy cannot be just about you
(and your employer) contributing code that you develop in-house at
Eventful. A project must be able to survive any single committer
leaving the project and simply put, Lucy does not meet that criteria.
In the early stages, yes, often one committer gets things going, but
Lucy's been around for a fairly long time on life support and you only
seem to pop up on the list when nudged by the PMC.
In my opinion, it is not in the best interests of the Apache Lucene
project to
make it more difficult for my employer and myself to contribute.
I agree, but unfortunately, it is Lucy that has languished for a good
long time.
It is fairly apparent to me that the Lucy project is not making any
progress community-wise or code-wise. Neither Marvin, Dave or Doug
are active at all on it, and that accounts for all three committers.
There has been very little mailing list traffic,
You may have noticed that up until about three weeks ago (when I
dove back
into the code cave), I was quite active on java-
[email protected] and in
the Lucene JIRA forums. Significant design innovations were realized,
particularly in the area of real-time search.
In the past, many designs have been hashed out cooperatively on the
KinoSearch
and Lucy mailing lists: the Schema class, revisions to QueryParser
and the
boolean Query hierarchy, the implementation of human-readable index
metadata,
C configuration probing, the OO model, index designs which exploit
memory
mapping, and so on.
In this particular case, however, I was assigned the task of solving
real-time
search, for which the Lucy and KinoSearch forums were not ideal.
There is a
very limited number of people who have both the familiarity with the
Lucene/Lucy segment-based inverted index model and the interest to
discuss
real-time search at the level I desired, where concepts like
"segment-centric
search" could be bandied about. Basically, I needed Mike McCandless
-- so I
went to where he could be found.
The conversations that we had in JIRA and on java-dev were
beneficial to both
Lucene and Lucy; should I have posted to the Lucy dev list instead
simply to
demonstrate activity, which would have been less useful to Mike, to
me, to
Lucy, and to Lucene? To my mind, the Lucene community is also part
of the
Lucy community. Mike's insights were welcome and useful, and it
didn't seem
important to me which specific mailing list they wound up on --
they're all
under the domain lucene.apache.org, after all. Weren't we all
moving forward
together, and wouldn't that be apparent to members of the PMC such as
yourself?
Or is this a zero-sum game where design innovations which help Lucy
don't
count as "progress" if they also help Lucene?
That's all fine, but none of it adds up to people looking at Lucy and
saying "Gee, I want to contribute to Lucy"
Furthermore, I have my doubts about the development process being
employed,
which seems to be the notion that KinoSearch is going to be donated
by
Marvin at some point in the future [1], which would only work if it
were to
go through the Software Grant or Incubation process (which I would
be happy
to support.), or at least that is how I understand the process to
be when
code is developed outside of the ASF.
I understand why you might have thought that, but that's not how
things will
play out, and it's a misreading of the post that you cite.
(<http://www.lucidimagination.com/search/document/152a1a9d00b7d08a/is_there_anybody_here
>)
As you note, simply importing KinoSearch wholesale into the Lucy
repository
with cosmetic changes would violate the terms of the project. But
even if
that were possible, it would represent a *horrendous missed
opportunity*.
A KinoSearch 1.0 release, with permanent API and file format backwards
compatibility guarantees -- i.e. "there will never be a KinoSearch
2.0" --
will be very beneficial for Lucy's development. Imposing such
discipline
allows library users to proceed with maximum confidence. For
instance, it
allows Peter Karman, who has long planned to build a KS backend for
Swish, to
move forward without having to worry about the upstream library
pulling the
rug out from underneath his users.
Going that route will maximize our ability to learn the limitations
and
weaknesses of the design. Using the knowledge we gain, we can then
forge
ahead as we have in the past: chunk by chuck, class by class. And
even though
I am very pleased with how pluggable index components, C API user
interface
improvements, "OS-as-JVM" file format changes, and so on are coming
along, I
anticipate lots of healthy debate and major discrepancies between
what ends up
in KS 1.0 and what ends up in Lucy.
Even if KS were the plan, in looking at KS, it seems there is not
much
community activity there, either.
This is largely due to the fact that it has been a long time since I
released
any significant public updates. I choose to release significant
updates
infrequently because breaking backwards compatibility has severe
consequences
for CPAN modules: as soon as the install completes, live apps start
crashing.
Since there is no sane deprecation mechanism for dynamically loaded
Perl
modules, minimizing backwards compatibility problems is a
responsibility I
take seriously.
On the flip side, one might ask what's the harm in letting it stand
as
is? Admittedly, not much, other than I think it confuses people b/c
they think there is a C port of Lucene and then they go and find it
is
dead.
Indeed. It's not like Lucy in its present form causes harm to the
bottom line
of Lucid Imagination, Inc. ;)
What's that got to do with anything? Give me a break. I'm not
attacking you. I'm just stating that Lucy has not had any code or any
community built for over three years.
Therefore, it is with some hesitation that I suggest we mothball
Lucy. Mostly, I hesitate, because I hate to see any project be
archived on the hope that someone will come in and pick it up.
However, I just don't see that happening. If Marvin wishes to
resurrect it, he can donate KS (or whatever core part of it is Lucy)
and go through incubation and prove there is a community and then we
can turn it back on.
Please give me two to three months to make the next dev release of
KinoSearch.
FWIW, if I can't get a release out within that time frame, I'm going
to have
to answer to Eventful. :)
This release will introduce real-time search, improved subclassing
support, an
mmap-friendly index file format, and pluggable indexing components.
I suspect
aspects of it may be of interest to the Java Lucene dev community --
but if
that's the case, I won't hold it against you. ;)
Again, this is all great, but it just further demonstrates that you
are doing this on your own and not as a part of the Lucy community (or
really, even the Lucene community). It's not a judgment of you or of
KS. I really like what you are doing. It's merely a statement that
this is not how Apache works. There are plenty of other places to
host code that do not have these requirements.