[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe updated LUCENE-4642: ------------------------------- Attachment: LUCENE-4642-single-create-method-on-TokenizerFactory-subclasses.patch Patch: - {{TokenizerFactory.create(Reader)}} calls the {{AttributeFactory}}-accepting version with {{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}} - {{TokenizerFactory.create(AttributeFactory, Reader)}} is made abstract - Added {{AttributeFactory}}-accepting constructors to all {{Tokenizer}}s with existing {{TokenizerFactory}} subclasses that didn't already have them - Removed {{create(Reader)}} from all TokenizerFactory subclasses. In this patch there is a new even more horrible hack in {{TrieTokenizer(Factory)}} - the {{AttributeFactory}} argument to the {{TrieTokenizer}} constructor is *ignored*!!! Surely there a better way???: {code:java} public class TrieTokenizerFactory extends TokenizerFactory { ... @Override public TrieTokenizer create(AttributeFactory factory, Reader input) { return new TrieTokenizer(factory, input, type, TrieTokenizer.getNumericTokenStream(precisionStep)); } } final class TrieTokenizer extends Tokenizer { ... public TrieTokenizer(Reader input, TrieTypes type, final NumericTokenStream ts) { this(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, input, type, ts); } public TrieTokenizer(AttributeFactory factory, Reader input, TrieTypes type, final NumericTokenStream ts) { // Hack #0: factory param is ignored // Häckidy-Hick-Hack #1: must share the attributes with the NumericTokenStream we delegate to, so we create a fake factory: super(new AttributeFactory() { @Override public AttributeImpl createAttributeInstance(Class<? extends Attribute> attClass) { return (AttributeImpl) ts.addAttribute(attClass); } }, input); // add all attributes: for (Iterator<Class<? extends Attribute>> it = ts.getAttributeClassesIterator(); it.hasNext();) { addAttribute(it.next()); } this.type = type; this.ts = ts; // dates tend to be longer, especially when math is involved termAtt.resizeBuffer( type == TrieTypes.DATE ? 128 : 32 ); } {code} > Add create(AttributeFactory) to TokenizerFactory and subclasses with ctors > taking AttributeFactory, and remove Tokenizer's and subclasses' ctors taking > AttributeSource > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-4642 > URL: https://issues.apache.org/jira/browse/LUCENE-4642 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 4.1 > Reporter: Renaud Delbru > Assignee: Steve Rowe > Labels: analysis, attribute, tokenizer > Fix For: 4.3 > > Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch, > LUCENE-4642.patch, > LUCENE-4642-single-create-method-on-TokenizerFactory-subclasses.patch, > TrieTokenizerFactory.java.patch > > > All tokenizer implementations have a constructor that takes a given > AttributeSource as parameter (LUCENE-1826). These should be removed. > TokenizerFactory does not provide an API to create tokenizers with a given > AttributeFactory, but quite a few tokenizers have constructors that take an > AttributeFactory. TokenizerFactory should add a create(AttributeFactory) > method, as should subclasses for tokenizers with AttributeFactory accepting > ctors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org