Erick Erickson created SOLR-10186:
-------------------------------------
Summary: Allow CharTokenizer-derived tokenizers and
KeywordTokenizer to configure the max token length
Key: SOLR-10186
URL: https://issues.apache.org/jira/browse/SOLR-10186
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Erick Erickson
Priority: Minor
Is there a good reason that we hard-code a 256 character limit for the
CharTokenizer? In order to change this limit it requires that people copy/paste
the incrementToken into some new class since incrementToken is final.
KeywordTokenizer can easily change the default (which is also 256 bytes), but
to do so requires code rather than being able to configure it in the schema.
For KeywordTokenizer, this is Solr-only. For the CharTokenizer classes
(WhitespaceTokenizer, UnicodeWhitespaceTokenizer and LetterTokenizer)
(Factories) it would take adding a c'tor to the base class in Lucene and using
it in the factory.
Any objections?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]