On Jul 31, 2006, at 8:25 AM, Nicolas Lalevée wrote:

That looks good, but there is one restriction : it have to be per document.

Yes, what I laid out was per-document - for each document, the fdx file would keep a file pointer and an integer mapping to a codec.

In fact I was thinking about a more generic version that will allow the format
compatibility, keeping .fdx as is :

FieldData (.fdt) -->  <DocFieldData>SegSize
DocFieldData --> FieldCount, <FieldNum, RawData>FieldCount

And a default FieldsDataWriter will be the actual one, it will read the
RawData as Bits, Value, with Value -->  String | BinaryValue,....
Then, for my app, I will provide some custom FieldsDataWriter that will do
exactly what I want.

OK, that's quite similar, but with the info specifying how to deserialize the document stored in fdt rather than fdx. However, I don't think what you're describing makes the field storage in Lucene arbitrarily extensible, since you're just going to override FieldsWriter/FieldsReader rather than modify them so that they can use arbitrary codecs.

I think what I want to do is turn Lucene into an Object-Oriented Database, or at least have Lucene adopt some characteristics of an ODBMS. However, I haven't used a real ODBMS and I'm not up on the theory, so I can't say for sure. I've been doing a little reading here and there on object databases, but I've been extraordinarily busy the last few weeks and haven't been able to study it in depth.

The main point is this:

Lucene users have diverse needs for what gets stored in the document/ field storage. We've been meeting those needs by assigning more and more bit flags. That can't continue that ad infinitum. However, we *can* meet everyone's needs by applying a variant of the "Replace Conditionals With Polymorphism" refactoring technique...

http://xrl.us/p3kn (Link to www.eli.sdsu.edu)

Think of those bit flags as an if-else chain. Instead of all those conditionals describing all the attributes of the Lucene Document you want to store at that file pointer, we allow you to put whatever kind of serialized object you desire there. Maybe it's a Lucene Document. Maybe it's a FrechDocument. Maybe it's a RussianDocument. Maybe it's a wrapped-up jpg. You choose.

Instead of continually adding to the complexity of the deserialization algorithm, we we make that deserialization algorithm user-definable.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to