On Jul 31, 2006, at 8:25 AM, Nicolas Lalevée wrote:
That looks good, but there is one restriction : it have to be per
document.
Yes, what I laid out was per-document - for each document, the fdx
file would keep a file pointer and an integer mapping to a codec.
In fact I was thinking about a more generic version that will allow
the format
compatibility, keeping .fdx as is :
FieldData (.fdt) --> <DocFieldData>SegSize
DocFieldData --> FieldCount, <FieldNum, RawData>FieldCount
And a default FieldsDataWriter will be the actual one, it will read
the
RawData as Bits, Value, with Value --> String | BinaryValue,....
Then, for my app, I will provide some custom FieldsDataWriter that
will do
exactly what I want.
OK, that's quite similar, but with the info specifying how to
deserialize the document stored in fdt rather than fdx. However, I
don't think what you're describing makes the field storage in Lucene
arbitrarily extensible, since you're just going to override
FieldsWriter/FieldsReader rather than modify them so that they can
use arbitrary codecs.
I think what I want to do is turn Lucene into an Object-Oriented
Database, or at least have Lucene adopt some characteristics of an
ODBMS. However, I haven't used a real ODBMS and I'm not up on the
theory, so I can't say for sure. I've been doing a little reading
here and there on object databases, but I've been extraordinarily
busy the last few weeks and haven't been able to study it in depth.
The main point is this:
Lucene users have diverse needs for what gets stored in the document/
field storage. We've been meeting those needs by assigning more and
more bit flags. That can't continue that ad infinitum. However, we
*can* meet everyone's needs by applying a variant of the "Replace
Conditionals With Polymorphism" refactoring technique...
http://xrl.us/p3kn (Link to www.eli.sdsu.edu)
Think of those bit flags as an if-else chain. Instead of all those
conditionals describing all the attributes of the Lucene Document you
want to store at that file pointer, we allow you to put whatever kind
of serialized object you desire there. Maybe it's a Lucene
Document. Maybe it's a FrechDocument. Maybe it's a
RussianDocument. Maybe it's a wrapped-up jpg. You choose.
Instead of continually adding to the complexity of the
deserialization algorithm, we we make that deserialization algorithm
user-definable.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]