On Oct 18, 2006, at 8:03 PM, David Balmain wrote:
For Ruby I can use the make alternative rake. But I'm thinking about
Ferret at the moment.
Forgive me, I don't understand why you make the distinction in that
sentence between "Ruby" and "Ferret". Is there a reason you could
use rake with Lucy but not Ferret?
I can spec extra flags to CBuilder's compile() function if turns out
to be necessary. However, CBuilder, by default, passes the same set
of flags that were used when compiling the Perl executable (which are
archived, along with a zillion other settings from Perl's Configure
script, in the Config module). On a RedHat 9 box I have access to,
those flags include -D_LARGEFILE_SOURCE and -D_FILE_OFFSET_BITS=64,
and I'm assuming that other Perl installations where LFS isn't the OS
default also spec flags rather than defining macros within individual
source files.
Unfortunately these values are defined as macros in Ruby.
Could we build a custom Charmonizer probe for Ruby then?
static char ruby_largefiles_code[] = METAQUOTE
#include "ruby.h" /* or whatever the file is */
#include "_charm.h";
int main() {
Charm_Setup;
printf("%d", (int)sizeof(off_t));
return 0;
}
METAQUOTE;
Any reason the native language needs to support LFS? If all access to
the index files is through Lucy, it shouldn't matter right?
There's two levels of support we need to consider: whether the host
language was compiled using LFS, and whether LFS is available at
all. I definitely want to avoid supporting systems that can't deal
with large files at all because I don't want to have to think about
how many bytes a file pointer might have every time I see one. File
pointers in Lucy should be 64-bit integers. Period.
As for the the case where the host language may not support LFS, we
might get away with it, but I'm not a big fan of the idea, because
LFS bugs are really hard to test for and only bite you when you've
already got a lot going on. And stuff can hide in funny places like
that stat() call example.
We should make Charmonizer's implementation fail-safe, regardless.
We can add a LargeFiles_try_macros() function which adds those
#defines to the probe code. We can start off just with
_LARGEFILE_SOURCE and _FILE_OFFSET_BITS=64, getting into the more
esoteric #defines if we get failure reports.
How many Ruby installs are there without LFS? I'd be shocked if
there were more than a handful of old and decrepit ones. Should we
support old versions? I don't think Ferret is, and I'd prefer not
to. KinoSearch supports only Perl 5.8.3 and later.
I propose that we probe for LFS in Ruby and bomb out if it's not
there. Then we add LargeFiles_try_macros() to ./charmonize and
define -DLUCY_RUBY as a flag to enable it when compiling charmonize.c.
#ifdef LUCY_RUBY
LargeFiles_try_macros();
#endif
LarteFiles_run(conf_fh);
One other thing. Have you thought about detecting dirent.h in
charmonizer?
We could add a Dirent module to Charmonizer, but I'm not sure I see
immediate benefits. We'll definitely need dirent.h for Lucy, because
we need a way to list the contents of an FSDirectory/FSStore/
FSInvIndex. Fortunately, dirent.h is widely available. Building
Perl actually requires that it be available -- it's one of the few
non-ANSI C modules Perl can't live without.
The thing is, the behavior of dirent.h is predictable enough for our
purposes. Some systems provide d_namlen as a struct member, but
others don't so if you want to write portable code you use strlen
(entry->d_name). I think that's the end of the story, isn't it? We
absolutely must have dirent.h, and we can write portable code for it
without needing the sort of pre-compile-time probing Charmonizer
provides. We don't need to worry about other struct members that may
or may not be there, and that a couple calls to strlen() on filenames
won't be a performance concern.
The only thing I can think of is whether readdir_r, the reentrant
version of readdir, is always available. That's something I don't
know. But I don't see anything in the AutoConf documentation about
it, so I'd gather it's always there.
I think we're closing in on the feature set Lucy needs Charmonizer to
supply. It'd be sorta nice to detect non-IEEE floats so we could
throw a meaningful error at compile-time rather than just fail
Similarity's tests on encode_norm/decode_norm. But I don't think
it's worth the effort since those systems are so rare, and I'm going
to back-burner that one.
Filepath handling is the one big feature left I think we ought to put
in Charmonizer. That sounds ambitious, but it doesn't have to be.
Lucy basically only needs to know what the directory separator is,
because all it ever needs to do is concatenate the filename onto the
index directory. Directory names ought to be normalized to full
filepaths, but such paths are always going to have to be supplied by
the user at the native level, so we can rely upon native routines for
normalization.
Since Charmonizer is only serving one master for now, its FilePath
module can be cheesy and only supply one constant macro, DIR_SEP.
Are we going to need any directory reading functions in
Lucy? I use it to clear the directory when the IndexWriter create flag
is set to true but I guess this isn't really necessary.
You also need it when you read an index which resides on the
filesystem into a RAMDirectory/RAMStore/RAMInvIndex.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/