Marvin Humphrey wrote:
On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote:
I too am happy that we have no more commit lock :)
Not just that. :)
No more lock directory, since we can put write.lock in the index
directory itself.
No more lock file name munging, since lock files from different indexes
no longer need to avoid collisions within a shared namespace.
No more need to deal with any files outside of the index directory.
Those three changes have a bigger impact on Lucy than they do on Lucene,
and since I'm writing a lot of KS 0.20 code with the notion that it will
be submitted to Lucy, they're having an impact on what I'm doing right
now. C doesn't provide a number of the dependencies needed to support
the old lock system, so we would either have had to include them, write
them ourselves, or supply the needed functionality via PITA callbacks to
the host language (Perl, Ruby, etc).
Since the lock directory lived in the system's tmp directory, we needed
code to discover where it was. Now we don't.
The lock file name munging required a checksum string generator. We
don't need that now.
Lastly, a failure of imagination had left me blind to the fact that we
didn't need sophisticated, portable filepath manipulating routines: just
knowing a directory separator suffices. Previously, I'd wrapped Perl's
File::Spec::Functions to make catfile() and canonpath() available from
C. That hadn't been necessary, because we could have built up the
lockfile paths given the location of the tmp directory and the dir_sep.
However, as is often the case, simplifying the implementation reveals
unnecessary cruft, and when all of a sudden everything ended up in one
directory with a splash, it became obvious that generating filepaths
didn't require heavy machinery.
But I have to say the lockless changes pale in comparison to what you
have done/are doing with KinoSearch, specifically the clean merge
model with an external sorter and other related file format changes
look very interesting.
Ooh, excellent points!
In fact, we haven't done this follow-through for Lucene but I think we
now should? I think having only one directory (the index directory)
where things happen, and simple file name for the write lock
("write.lock") is a great simplification to our users.
Now that readers are read-only, I think it makes sense to default the
write lock into the index directory, and as you describe, no longer
generate a "unique namespace" hash lock ID since the index dir gives
us that scoping.
Are there any reasons not to do this? I will open a JIRA issue to
track this.
Well, I look forward to seeing whether you can suggest improvements on
some of the algos I'll bring up in this forum once KS 0.20_01 is out. :)
I will try, but I'm already behind just trying to understand how we
could improve Lucene based on your current KS release! Is there any
preview/general summary of what's being done for KS 2.0/Lucy? I tried
to quickly search the KS archives and look through Lucy's archives but
didn't find any solid hit.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]