Greets,

There are some portability problems that may not be worth solving.

On some Crays, ints, longs and pointers are all 8 bytes (the ILP64 format). I propose not supporting any machine where we can't guarantee that lucy_i8_t is 1 byte and lucy_i32_t is 4 bytes.

A second esoteric problem is machines that don't use IEEE 754 for floats: <http://www.codeproject.com/tools/libnumber.asp>. I think that the norms-encoding routine will break on such machines. That ought to be the only problem, I think but it's gnarly enough I think we should just decide not to support those boxes.

Another wrinkle is large file support. Machines that don't support large files are growing scarcer by the day, but eventually, somebody who has one will want to use Lucy. Index files can get pretty big.

Is it even possible for a machine to have large file support and not provide a 64-bit integer? The only thing Lucene ever uses 64-bit integers for is file pointers. KinoSearch takes advantage of this in a weird way -- it uses doubles wherever Lucene uses Java longs. I did it that way because Perl always provides support for doubles, but 64-bit integer support takes a special compile and generally doesn't work very well. The 52-bit mantissa in an IEEE 754 double is more than enough for any file pointer. But when I made that call, I was using native Perl filehandles as InStream objects; KinoSearch doesn't do that anymore, and I don't think we should go the doubles-as-file- pointers route with Lucy (even though it Just Works).

I'm inclined to require both large file support and 64-bit integers for Lucy. What say?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Reply via email to