On Jul 21, 2006, at 1:23 AM, Nicolas Lalevée wrote:
In fact, that was my first implementaion. The problem with that is
you can
only store one value. But thinking a little more about it, storing
one or
more value is not an issue, because with the solution I proposed,
no space is
saved at all.
In fact, when I thought about this format of field metadata, I was
thinking
about a way to make the Lucene user specify how to store it in the
Lucene
index format. For instance, the simple one would specify that it's
a pointeur
on some metadata (as you proposed), another one would specify that
there are
two pointeurs (in my use case, one for type, the other one for the
language),
and another one whould specify that it will be store directly as it is
actually an integer (so no need to make a pointer on integer. But
it was just
a thought, I don't know if it is possible. WDYT ?
I'm thinking that there would be a codecs file, say with the
extension .cdx and this format:
Codecs (.cdx) --> CodecCount, <CodecClassName>CodecCount
CodecCount --> Uint32
CodecClassName --> String
That file would be read in its entirety when the index was
initialized and expanded into an array of codec objects, one per
CodecClassName.
The .fdx file would add an additional int per doc...
FieldIndex (.fdx) --> <FieldValuesPosition,
FieldValuesCodecNumber>SegSize
FieldValuesPosition --> Uint64
FieldValuesCodecNumber --> Uint32
Now, before you read any data from the .fdt file, you know how to
interpret it. You seek the .fdt IndexInput to the right spot, then
feed it to the appropriate codec object from the codecs array. The
codec does the rest. In your case, you might write a codec that
would read a few bytes and strings of metadata up front. Or you
might have several different codecs, the identity of which indicates
fixed values for certain metadata fields: FrenchDocument,
ArabicDocument, etc.
Would that scheme meet your needs?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]