[ 
https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796027#action_12796027
 ] 

Uwe Schindler edited comment on LUCENE-2183 at 1/3/10 11:18 PM:
----------------------------------------------------------------

{quote}
There is only one exception where reflection is used... that is during ctor to 
determine if:

- you subclass a tokenizer that implements int-based methods 
- you have only implemented char-based methods 
- you request VERSION >= 3.1 
{quote}

With LUCENE-2188, this is easy and no performance problem. Just define two 
static final fields for both char-based methods and check in the ctor if 
this.getClass() overrides the char-based method. In this case throw UOE. The 
result is cached for the class and further instantiations of the same class 
will not use reflection anymore:

{code}
private static final OverrideableMethod<CharTokenizer> isTokenCharMethod=
    new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "isTokenChar", 
char.class);
private static final OverrideableMethod<CharTokenizer> normalizeMethod=
    new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "normalize", 
char.class);
...
public CharTokenizer(...) {
  super(...)
  if (matchVersion.onOrAfter(Version.LUCENE_31) && (
   isTokenCharMethod.isOverriddenBy(this.getClass()) || 
normalizeMethod.isOverriddenBy(this.getClass())
  ) throw new IAE("For matchVersion >= LUCENE_31, CharTokenizer subclasses must 
not override isTokenChar(char) or normalize(char)."):
}
{code}

      was (Author: thetaphi):
    {quote}
There is only one exception where reflection is used... that is during ctor to 
determine if:

- you subclass a tokenizer that implements int-based methods 
- you have only implemented char-based methods 
- you request VERSION >= 3.1 
{quote}

With LUCENE-2188, this is easy and no performance problem. Just define two 
static final fields for both char-based methods and check in the ctor if 
this.getClass() overrides the char-based method. In this case throw UOE. The 
result is cached for the class and further instantiations of the same class 
will not use reflection anymore:

{code}
private static final OverrideableMethod<CharTokenizer> isTokenCharMethod=
    new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "isTokenChar", 
char.class);
private static final OverrideableMethod<CharTokenizer> normalizeMethod=
    new OverrideableMethod<CharTokenizer>(CharTokenizer.class, "normalize", 
char.class);
...
public CharTokenizer(...) {
  super(...)
  if (matchVersion.onOrAfter(Version.LUCENE_31) && (
   isTokenCharMethod.getOverrideDistance(this.getClass()) > 0 || 
normalizeMethod.getOverrideDistance(this.getClass()) > 0
  ) throw new IAE("For matchVersion >= LUCENE_31, CharTokenizer subclasses must 
not override isTokenChar(char) or normalize(char)."):
}
{code}
  
> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
>
>                 Key: LUCENE-2183
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2183
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Simon Willnauer
>             Fix For: 3.1
>
>         Attachments: LUCENE-2183.patch
>
>
> CharTokenizer is an abstract base class for all Tokenizers operating on a 
> character level. Yet, those tokenizers still use char primitives instead of 
> int codepoints. CharTokenizer should operate on codepoints and preserve bw 
> compatibility. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to