I don't think we should go that far. If you extend Lucene45Codec you
basically agree to the entire index format, but are given a chance to
control per-field postings and doc values. Otherwise, make your own Codec,
and then you'll need to register it in META-INF/services.

The assert I proposed to make in the ctor is only for "education purposes"
-- apps need not register their Lucene45CodecExtension in services. We can
document it, and assertions would help verify it.

Shai


On Fri, Aug 30, 2013 at 4:49 PM, Uwe Schindler <[email protected]> wrote:

> Hi,
>
> > On Fri, Aug 30, 2013 at 3:00 PM, Shai Erera <[email protected]> wrote:
> > > The Codec itself may not be needed to be specified in
> > > META-INF/services, but the DVFormat it uses is.
> >
> > Correct, because otherwise Lucene couldn't read segments written with
> this
> > DV format.
> >
> > > So it's not like you can define a Codec today which does not list
> > > anything in 'services', unless your Codec just reuses one of the
> > > predefined DVF/PF listed under core/codecs. Is that right?
> >
> > This is correct, and I do that quite often (defining anonymous
> sub-classes of
> > Lucene45 without registering anything in META6INF/services).
> >
> > > It's confusing that these per-field things are used differently while
> > > indexing and reading. At indexing, the Codec decides what to return
> > > per-field, at search the Codec is more or less not used, cause the
> > > per-field formats are read from FI.attributes and initialized directly.
> >
> > I don't find it confusing. At reading time, it just does the right
> thing, ie. using
> > the format which has been used for writing on a per-segment basis. So you
> > could change your mind and decide all your doc values formats should be
> > disk-based although some of them were memory-based and it will just work.
> > Old segments will still use MemoryDocValuesFormat while the new ones
> > written with DiskDocValuesFormat will be disk-based. The index will be
> > progressively migrated as segments are merged.
>
> I think the most confusing part here is inconsistency between several
> parts: If everything in the Lucene Default Codec would use META-INF to
> lookup stuff by name, we would never ever need to have other instances of
> Lucene's main codec in META-INF (it would always be enough to have
> anonymous classes to change postings formats, DV formats,... on writing,
> IndexReader would always know how to read). But unfortunately not all parts
> are done by META-INF. If you want to plug in another stored fields format,
> you have to define a new codec, because StoredFieldsFormats have no name
> and are not written to the index file.
>
> I think we should change this and make all parts a codec offers to the
> outside implement NamedSPI, so it can be read from the index (not only the
> per-field stuff). Replacing a new Lucene main codec would then only be
> possible if you change the basic file format completely (like addig a
> replacement for compound files, new segment infos formats,...).
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to