> AFAIK, *.fdt files aren't used for searching the index, only
> retrieving stored fields.
> Currently, binary & compressed are options per field value, not per
> field.  Storing binary/compressed in the *.fnm file would mean that
> they would have to be the same for all values of that field.

That's correct. Is it a problem? (maybe it is, please tell me
if so) I do not see any compelling reasons mixing
binary/non binary values in the same field.

And if for some reason such a situation arises: storing all
values as binary values and letting the application
(in contrast to Lucene library) decide what the real type
of the stored value (e.g. saving the type in the first byte)
could be a solution.

Same thing for compression: if there is a need to store
compressed and not compressed data in the same field:
save data (compressed/encoded/uncompressed/....) as
binary and let the first byte indicate the way data was
packaged.

What I mean is: sophisticated storage of data can be
done in any complexity, as soon as it is possible to
store binary data in a field. But it is then up to application.
Declaring field attributes immutable over an index
makes for a clear and simple design.

>
> That would also make merging segments very challenging...
> If one segment had some binary fields, and another segment had
> non-binary fields, what would one do when merging the two segments?

Well, I thing merging segments should be possible only if
the field definitions are consistent throughout the segments.
Merging inconsistent segments looks for me like an error at worst
and bad design at least. But I may just not have met an
appropriate use case yet...

> > The nine booleans in Field could be replaced by a bit mask (int).
>
> Personally, that's my preference, but it's more "C" like, and feels

That's why I came to the point ( http://sourceforge.net/projects/phplucene)

> like it goes in the opposite direction from where people have been
> taking the Field class (for example, the type safe enum pattern,
> Field.Index and Field.Store classes).

It's not a contradiction. Index/Store/TermVector classes do not
depend on the implementation of field attributes as booleans
or bit mask. I just vote for putting them all to one place (ideally
FieldInfo) and storing them at one place (*fnm).

Consider the handling of field attributes in the current implementation,
It is scattered over:

--- Field (boolean members, 9)
--- FieldInfo (boolean members, partially the same as in Field)
--- FieldInfos (bit masks for storing some attributes in *fnm)
--- FieldsWriter (bit masks for storing some other attributes in *fdt)
--- FieldsReader ( logic for initializing field attributes, touches all
the classes above)

I thank you for taking the time to read this. If the subject is 
of interest to you, I'd prepare a patch.

Robert


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to