[ 
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4642:
-------------------------------

    Attachment: 
LUCENE-4642-single-create-method-on-TokenizerFactory-subclasses.patch

Patch:

- {{TokenizerFactory.create(Reader)}} calls the {{AttributeFactory}}-accepting 
version with {{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}}
- {{TokenizerFactory.create(AttributeFactory, Reader)}} is made abstract
- Added {{AttributeFactory}}-accepting constructors to all {{Tokenizer}}s with 
existing {{TokenizerFactory}} subclasses that didn't already have them
- Removed {{create(Reader)}} from all TokenizerFactory subclasses.

In this patch there is a new even more horrible hack in 
{{TrieTokenizer(Factory)}} - the {{AttributeFactory}} argument to the 
{{TrieTokenizer}} constructor is *ignored*!!!  Surely there a better way???:

{code:java}
public class TrieTokenizerFactory extends TokenizerFactory {
...
  @Override
  public TrieTokenizer create(AttributeFactory factory, Reader input) {
    return new TrieTokenizer(factory, input, type, 
TrieTokenizer.getNumericTokenStream(precisionStep));
  }
}

final class TrieTokenizer extends Tokenizer {
...
  public TrieTokenizer(Reader input, TrieTypes type, final NumericTokenStream 
ts) {
    this(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, input, type, ts);
  }

  public TrieTokenizer(AttributeFactory factory, Reader input, TrieTypes type, 
final NumericTokenStream ts) {
    // Hack #0: factory param is ignored
    // Häckidy-Hick-Hack #1: must share the attributes with the 
NumericTokenStream we delegate to, so we create a fake factory:
    super(new AttributeFactory() {
      @Override
      public AttributeImpl createAttributeInstance(Class<? extends Attribute> 
attClass) {
        return (AttributeImpl) ts.addAttribute(attClass);
      }
    }, input);
    // add all attributes:
    for (Iterator<Class<? extends Attribute>> it = 
ts.getAttributeClassesIterator(); it.hasNext();) {
      addAttribute(it.next());
    }
    this.type = type;
    this.ts = ts;
    // dates tend to be longer, especially when math is involved
    termAtt.resizeBuffer( type == TrieTypes.DATE ? 128 : 32 );
  }
{code}
 
                
> Add create(AttributeFactory) to TokenizerFactory and subclasses with ctors 
> taking AttributeFactory, and remove Tokenizer's and subclasses' ctors taking 
> AttributeSource
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4642
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4642
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.1
>            Reporter: Renaud Delbru
>            Assignee: Steve Rowe
>              Labels: analysis, attribute, tokenizer
>             Fix For: 4.3
>
>         Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, 
> LUCENE-4642.patch, 
> LUCENE-4642-single-create-method-on-TokenizerFactory-subclasses.patch, 
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given 
> AttributeSource as parameter (LUCENE-1826).  These should be removed.
> TokenizerFactory does not provide an API to create tokenizers with a given 
> AttributeFactory, but quite a few tokenizers have constructors that take an 
> AttributeFactory.  TokenizerFactory should add a create(AttributeFactory) 
> method, as should subclasses for tokenizers with AttributeFactory accepting 
> ctors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to