[
https://issues.apache.org/jira/browse/LUCENE-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144645#comment-13144645
]
Uwe Schindler commented on LUCENE-3560:
---------------------------------------
Heavy reflecting in the good old TokenStream assertFinal (I was so unhappy when
Analyzer was restructured that it had to go there... *g*).
Some comments:
- We should maybe also add a check that there is at least a default constructor
available. this.getClass().getConstructor() does not throw exception
- In general, subclassing a Codec or a PostingsFormat is wrong (except the
Lucene3x hack). If you subclass a codec/PF, you can no longer change it's
name. So anybody who subclasses a codec will produce a clone with the same name
but perhaps another index format. This is prevented by Robert's finalness on
the format hooks, but what else could a codec do different if its not final
without breaking index format?
- I think even 3x Codec should be final and not subclassed by the RW codec. The
RW Preflex codec in tests should subclass abstract Codec, and simply delegate
all "read" methods to the RO-Codec [I am not sure if this all works as its very
complicated... *g* - I only mention: new Exception().getStackTrace() to inspect
call stack... highly sophisticated!].
> add extra safety to concrete codec implementations
> --------------------------------------------------
>
> Key: LUCENE-3560
> URL: https://issues.apache.org/jira/browse/LUCENE-3560
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 4.0
> Reporter: Robert Muir
> Attachments: LUCENE-3560.patch
>
>
> In LUCENE-3490, we reorganized the codec model, and a key part of this is
> that Codecs are "safer"
> and don't rely upon client-side configuration: IndexReader doesn't take Codec
> or anything of that
> nature, only IndexWriter.
> Instead for "read" all codecs are initialized from the classpath via a no-arg
> ctor from Java's
> Service Provider Mechanism.
> So, although Codecs can still take parameters in the constructors, be
> subclassable, etc (for passing
> to IndexWriter), this enforces that they must write any configuration
> information they need into
> the index, so that we don't have a flimsy API.
> I think we should go even further, for additional safety. Any methods on our
> concrete codecs that
> are not intended to be subclassed should be final, and we should add
> assertions to verify this.
> For example, SimpleText's files() implementation should be final. If you want
> to make an extension
> of simpletext that has additional files, then this is a different index
> format and should have a
> different name!
> Note: This doesn't stop extensibility, only stupid mistakes.
> For example, this means that Lucene40Codec's postingsFormat() implementation
> is final, even though
> it offers a configurable "hook" (getPostingsFormatForField) for you to
> specify per-field postings
> formats (which it writes into a .per file into the index, so that it knows
> how to read each field).
> {code}
> private final PostingsFormat postingsFormat = new PerFieldPostingsFormat() {
> @Override
> public PostingsFormat getPostingsFormatForField(String field) {
> return Lucene40Codec.this.getPostingsFormatForField(field);
> }
> };
> ...
> @Override
> public final PostingsFormat postingsFormat() {
> return postingsFormat;
> }
> ...
> /** Returns the postings format that should be used for writing
> * new segments of <code>field</code>.
> *
> * The default implementation always returns "Lucene40"
> */
> public PostingsFormat getPostingsFormatForField(String field) {
> return defaultFormat;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]