Hi
While debugging a test failure (unrelated to the subject of this email), I
noticed something strange. I set the Codec to Lucene45Codec, yet when the
segment was read, it used Facet45Codec. Digging further I found this:
- All Lucene45Codec extensions share the same "Lucene45" name, there's
no way to override that.
- facet/META-INF was loaded before core/META-INF, so NamedSPILoader set
"Lucene45=Facet45Codec" and when later it discovered Lucene45Codec, it
didn't override the mapping (as it defines).
While in this case there's no real harm done since all Facet45Codec does is
apply special logic for the "facet" fields, and also during search time the
DVFormat and PostingsFormat are loaded by PerField based on attributes it
recorded in FieldInfo, I think it's better if Codecs can have unique names.
It's definitely healthier (and safer) to know that the same Codec type that
was used to write this segment is used to read it.
I think this is very easy to solve, if we allow Lucene45Codec extensions
"own" their name if they choose to. It's quite easy, by having two ctors on
Lucene45Codec:
1. Default ctor (no args) which defaults the name to "Lucene45"
2. Protected ctor which takes a name as argument and can be called by
extensions' ctors. That way, Facet45Codec could do "super("Facet45")" and
own its name.
What do you think? Do you see any problem this may cause?
Shai