On Jul 21, 2006, at 1:23 AM, Nicolas Lalevée wrote:
In fact, that was my first implementaion. The problem with that is you can only store one value. But thinking a little more about it, storing one or more value is not an issue, because with the solution I proposed, no space is
saved at all.
In fact, when I thought about this format of field metadata, I was thinking about a way to make the Lucene user specify how to store it in the Lucene index format. For instance, the simple one would specify that it's a pointeur on some metadata (as you proposed), another one would specify that there are two pointeurs (in my use case, one for type, the other one for the language),
and another one whould specify that it will be store directly as it is
actually an integer (so no need to make a pointer on integer. But it was just
a thought, I don't know if it is possible. WDYT ?

I'm thinking that there would be a codecs file, say with the extension .cdx and this format:

  Codecs (.cdx)  --> CodecCount, <CodecClassName>CodecCount
  CodecCount     --> Uint32
  CodecClassName --> String

That file would be read in its entirety when the index was initialized and expanded into an array of codec objects, one per CodecClassName.

The .fdx file would add an additional int per doc...

  FieldIndex (.fdx) -->  <FieldValuesPosition,
                          FieldValuesCodecNumber>SegSize
  FieldValuesPosition    --> Uint64
  FieldValuesCodecNumber --> Uint32

Now, before you read any data from the .fdt file, you know how to interpret it. You seek the .fdt IndexInput to the right spot, then feed it to the appropriate codec object from the codecs array. The codec does the rest. In your case, you might write a codec that would read a few bytes and strings of metadata up front. Or you might have several different codecs, the identity of which indicates fixed values for certain metadata fields: FrenchDocument, ArabicDocument, etc.

Would that scheme meet your needs?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to