[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796027#action_12796027 ]
Uwe Schindler edited comment on LUCENE-2183 at 1/3/10 11:18 PM: ---------------------------------------------------------------- {quote} There is only one exception where reflection is used... that is during ctor to determine if: - you subclass a tokenizer that implements int-based methods - you have only implemented char-based methods - you request VERSION >= 3.1 {quote} With LUCENE-2188, this is easy and no performance problem. Just define two static final fields for both char-based methods and check in the ctor if this.getClass() overrides the char-based method. In this case throw UOE. The result is cached for the class and further instantiations of the same class will not use reflection anymore: {code} private static final OverrideableMethod<CharTokenizer> isTokenCharMethod= new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "isTokenChar", char.class); private static final OverrideableMethod<CharTokenizer> normalizeMethod= new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "normalize", char.class); ... public CharTokenizer(...) { super(...) if (matchVersion.onOrAfter(Version.LUCENE_31) && ( isTokenCharMethod.isOverriddenBy(this.getClass()) || normalizeMethod.isOverriddenBy(this.getClass()) ) throw new IAE("For matchVersion >= LUCENE_31, CharTokenizer subclasses must not override isTokenChar(char) or normalize(char)."): } {code} was (Author: thetaphi): {quote} There is only one exception where reflection is used... that is during ctor to determine if: - you subclass a tokenizer that implements int-based methods - you have only implemented char-based methods - you request VERSION >= 3.1 {quote} With LUCENE-2188, this is easy and no performance problem. Just define two static final fields for both char-based methods and check in the ctor if this.getClass() overrides the char-based method. In this case throw UOE. The result is cached for the class and further instantiations of the same class will not use reflection anymore: {code} private static final OverrideableMethod<CharTokenizer> isTokenCharMethod= new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "isTokenChar", char.class); private static final OverrideableMethod<CharTokenizer> normalizeMethod= new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "normalize", char.class); ... public CharTokenizer(...) { super(...) if (matchVersion.onOrAfter(Version.LUCENE_31) && ( isTokenCharMethod.getOverrideDistance(this.getClass()) > 0 || normalizeMethod.getOverrideDistance(this.getClass()) > 0 ) throw new IAE("For matchVersion >= LUCENE_31, CharTokenizer subclasses must not override isTokenChar(char) or normalize(char)."): } {code} > Supplementary Character Handling in CharTokenizer > ------------------------------------------------- > > Key: LUCENE-2183 > URL: https://issues.apache.org/jira/browse/LUCENE-2183 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Simon Willnauer > Fix For: 3.1 > > Attachments: LUCENE-2183.patch > > > CharTokenizer is an abstract base class for all Tokenizers operating on a > character level. Yet, those tokenizers still use char primitives instead of > int codepoints. CharTokenizer should operate on codepoints and preserve bw > compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org