I don't think we should go that far. If you extend Lucene45Codec you basically agree to the entire index format, but are given a chance to control per-field postings and doc values. Otherwise, make your own Codec, and then you'll need to register it in META-INF/services.
The assert I proposed to make in the ctor is only for "education purposes" -- apps need not register their Lucene45CodecExtension in services. We can document it, and assertions would help verify it. Shai On Fri, Aug 30, 2013 at 4:49 PM, Uwe Schindler <[email protected]> wrote: > Hi, > > > On Fri, Aug 30, 2013 at 3:00 PM, Shai Erera <[email protected]> wrote: > > > The Codec itself may not be needed to be specified in > > > META-INF/services, but the DVFormat it uses is. > > > > Correct, because otherwise Lucene couldn't read segments written with > this > > DV format. > > > > > So it's not like you can define a Codec today which does not list > > > anything in 'services', unless your Codec just reuses one of the > > > predefined DVF/PF listed under core/codecs. Is that right? > > > > This is correct, and I do that quite often (defining anonymous > sub-classes of > > Lucene45 without registering anything in META6INF/services). > > > > > It's confusing that these per-field things are used differently while > > > indexing and reading. At indexing, the Codec decides what to return > > > per-field, at search the Codec is more or less not used, cause the > > > per-field formats are read from FI.attributes and initialized directly. > > > > I don't find it confusing. At reading time, it just does the right > thing, ie. using > > the format which has been used for writing on a per-segment basis. So you > > could change your mind and decide all your doc values formats should be > > disk-based although some of them were memory-based and it will just work. > > Old segments will still use MemoryDocValuesFormat while the new ones > > written with DiskDocValuesFormat will be disk-based. The index will be > > progressively migrated as segments are merged. > > I think the most confusing part here is inconsistency between several > parts: If everything in the Lucene Default Codec would use META-INF to > lookup stuff by name, we would never ever need to have other instances of > Lucene's main codec in META-INF (it would always be enough to have > anonymous classes to change postings formats, DV formats,... on writing, > IndexReader would always know how to read). But unfortunately not all parts > are done by META-INF. If you want to plug in another stored fields format, > you have to define a new codec, because StoredFieldsFormats have no name > and are not written to the index file. > > I think we should change this and make all parts a codec offers to the > outside implement NamedSPI, so it can be read from the index (not only the > per-field stuff). Replacing a new Lucene main codec would then only be > possible if you change the basic file format completely (like addig a > replacement for compound files, new segment infos formats,...). > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
