Hi, > On Fri, Aug 30, 2013 at 3:00 PM, Shai Erera <[email protected]> wrote: > > The Codec itself may not be needed to be specified in > > META-INF/services, but the DVFormat it uses is. > > Correct, because otherwise Lucene couldn't read segments written with this > DV format. > > > So it's not like you can define a Codec today which does not list > > anything in 'services', unless your Codec just reuses one of the > > predefined DVF/PF listed under core/codecs. Is that right? > > This is correct, and I do that quite often (defining anonymous sub-classes of > Lucene45 without registering anything in META6INF/services). > > > It's confusing that these per-field things are used differently while > > indexing and reading. At indexing, the Codec decides what to return > > per-field, at search the Codec is more or less not used, cause the > > per-field formats are read from FI.attributes and initialized directly. > > I don't find it confusing. At reading time, it just does the right thing, ie. > using > the format which has been used for writing on a per-segment basis. So you > could change your mind and decide all your doc values formats should be > disk-based although some of them were memory-based and it will just work. > Old segments will still use MemoryDocValuesFormat while the new ones > written with DiskDocValuesFormat will be disk-based. The index will be > progressively migrated as segments are merged.
I think the most confusing part here is inconsistency between several parts: If everything in the Lucene Default Codec would use META-INF to lookup stuff by name, we would never ever need to have other instances of Lucene's main codec in META-INF (it would always be enough to have anonymous classes to change postings formats, DV formats,... on writing, IndexReader would always know how to read). But unfortunately not all parts are done by META-INF. If you want to plug in another stored fields format, you have to define a new codec, because StoredFieldsFormats have no name and are not written to the index file. I think we should change this and make all parts a codec offers to the outside implement NamedSPI, so it can be read from the index (not only the per-field stuff). Replacing a new Lucene main codec would then only be possible if you change the basic file format completely (like addig a replacement for compound files, new segment infos formats,...). Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
