> AFAIK, *.fdt files aren't used for searching the index, only > retrieving stored fields. > Currently, binary & compressed are options per field value, not per > field. Storing binary/compressed in the *.fnm file would mean that > they would have to be the same for all values of that field.
That's correct. Is it a problem? (maybe it is, please tell me if so) I do not see any compelling reasons mixing binary/non binary values in the same field. And if for some reason such a situation arises: storing all values as binary values and letting the application (in contrast to Lucene library) decide what the real type of the stored value (e.g. saving the type in the first byte) could be a solution. Same thing for compression: if there is a need to store compressed and not compressed data in the same field: save data (compressed/encoded/uncompressed/....) as binary and let the first byte indicate the way data was packaged. What I mean is: sophisticated storage of data can be done in any complexity, as soon as it is possible to store binary data in a field. But it is then up to application. Declaring field attributes immutable over an index makes for a clear and simple design. > > That would also make merging segments very challenging... > If one segment had some binary fields, and another segment had > non-binary fields, what would one do when merging the two segments? Well, I thing merging segments should be possible only if the field definitions are consistent throughout the segments. Merging inconsistent segments looks for me like an error at worst and bad design at least. But I may just not have met an appropriate use case yet... > > The nine booleans in Field could be replaced by a bit mask (int). > > Personally, that's my preference, but it's more "C" like, and feels That's why I came to the point ( http://sourceforge.net/projects/phplucene) > like it goes in the opposite direction from where people have been > taking the Field class (for example, the type safe enum pattern, > Field.Index and Field.Store classes). It's not a contradiction. Index/Store/TermVector classes do not depend on the implementation of field attributes as booleans or bit mask. I just vote for putting them all to one place (ideally FieldInfo) and storing them at one place (*fnm). Consider the handling of field attributes in the current implementation, It is scattered over: --- Field (boolean members, 9) --- FieldInfo (boolean members, partially the same as in Field) --- FieldInfos (bit masks for storing some attributes in *fnm) --- FieldsWriter (bit masks for storing some other attributes in *fdt) --- FieldsReader ( logic for initializing field attributes, touches all the classes above) I thank you for taking the time to read this. If the subject is of interest to you, I'd prepare a patch. Robert --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]