[
https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602894#comment-13602894
]
Steve Rowe edited comment on LUCENE-4642 at 3/14/13 11:54 PM:
--------------------------------------------------------------
Patch:
- {{TokenizerFactory.create(Reader)}} is made final, and calls the
{{AttributeFactory}}-accepting version with
{{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}}
- {{TokenizerFactory.create(AttributeFactory, Reader)}} is made abstract
- Added {{AttributeFactory}}-accepting constructors to all {{Tokenizer}}'s with
existing {{TokenizerFactory}} subclasses that didn't already have them
- Removed {{create(Reader)}} from all TokenizerFactory subclasses.
In this patch there is a new even more horrible hack in
{{TrieTokenizer(Factory)}} - the {{AttributeFactory}} argument to the
{{TrieTokenizer}} constructor is *ignored*!!! Surely there a better way???:
{code:java}
public class TrieTokenizerFactory extends TokenizerFactory {
...
@Override
public TrieTokenizer create(AttributeFactory factory, Reader input) {
return new TrieTokenizer(factory, input, type,
TrieTokenizer.getNumericTokenStream(precisionStep));
}
}
final class TrieTokenizer extends Tokenizer {
...
public TrieTokenizer(Reader input, TrieTypes type, final NumericTokenStream
ts) {
this(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, input, type, ts);
}
public TrieTokenizer(AttributeFactory factory, Reader input, TrieTypes type,
final NumericTokenStream ts) {
// Hack #0: factory param is ignored
// Häckidy-Hick-Hack #1: must share the attributes with the
NumericTokenStream we delegate to, so we create a fake factory:
super(new AttributeFactory() {
@Override
public AttributeImpl createAttributeInstance(Class<? extends Attribute>
attClass) {
return (AttributeImpl) ts.addAttribute(attClass);
}
}, input);
// add all attributes:
for (Iterator<Class<? extends Attribute>> it =
ts.getAttributeClassesIterator(); it.hasNext();) {
addAttribute(it.next());
}
this.type = type;
this.ts = ts;
// dates tend to be longer, especially when math is involved
termAtt.resizeBuffer( type == TrieTypes.DATE ? 128 : 32 );
}
{code}
was (Author: steve_rowe):
Patch:
- {{TokenizerFactory.create(Reader)}} calls the {{AttributeFactory}}-accepting
version with {{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}}
- {{TokenizerFactory.create(AttributeFactory, Reader)}} is made abstract
- Added {{AttributeFactory}}-accepting constructors to all {{Tokenizer}}s with
existing {{TokenizerFactory}} subclasses that didn't already have them
- Removed {{create(Reader)}} from all TokenizerFactory subclasses.
In this patch there is a new even more horrible hack in
{{TrieTokenizer(Factory)}} - the {{AttributeFactory}} argument to the
{{TrieTokenizer}} constructor is *ignored*!!! Surely there a better way???:
{code:java}
public class TrieTokenizerFactory extends TokenizerFactory {
...
@Override
public TrieTokenizer create(AttributeFactory factory, Reader input) {
return new TrieTokenizer(factory, input, type,
TrieTokenizer.getNumericTokenStream(precisionStep));
}
}
final class TrieTokenizer extends Tokenizer {
...
public TrieTokenizer(Reader input, TrieTypes type, final NumericTokenStream
ts) {
this(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY, input, type, ts);
}
public TrieTokenizer(AttributeFactory factory, Reader input, TrieTypes type,
final NumericTokenStream ts) {
// Hack #0: factory param is ignored
// Häckidy-Hick-Hack #1: must share the attributes with the
NumericTokenStream we delegate to, so we create a fake factory:
super(new AttributeFactory() {
@Override
public AttributeImpl createAttributeInstance(Class<? extends Attribute>
attClass) {
return (AttributeImpl) ts.addAttribute(attClass);
}
}, input);
// add all attributes:
for (Iterator<Class<? extends Attribute>> it =
ts.getAttributeClassesIterator(); it.hasNext();) {
addAttribute(it.next());
}
this.type = type;
this.ts = ts;
// dates tend to be longer, especially when math is involved
termAtt.resizeBuffer( type == TrieTypes.DATE ? 128 : 32 );
}
{code}
> Add create(AttributeFactory) to TokenizerFactory and subclasses with ctors
> taking AttributeFactory, and remove Tokenizer's and subclasses' ctors taking
> AttributeSource
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-4642
> URL: https://issues.apache.org/jira/browse/LUCENE-4642
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Affects Versions: 4.1
> Reporter: Renaud Delbru
> Assignee: Steve Rowe
> Labels: analysis, attribute, tokenizer
> Fix For: 4.3
>
> Attachments: LUCENE-4642.patch, LUCENE-4642.patch, LUCENE-4642.patch,
> LUCENE-4642.patch,
> LUCENE-4642-single-create-method-on-TokenizerFactory-subclasses.patch,
> TrieTokenizerFactory.java.patch
>
>
> All tokenizer implementations have a constructor that takes a given
> AttributeSource as parameter (LUCENE-1826). These should be removed.
> TokenizerFactory does not provide an API to create tokenizers with a given
> AttributeFactory, but quite a few tokenizers have constructors that take an
> AttributeFactory. TokenizerFactory should add a create(AttributeFactory)
> method, as should subclasses for tokenizers with AttributeFactory accepting
> ctors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]