[jira] [Created] (SOLR-10186) Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the max token length

Erick Erickson (JIRA) Tue, 21 Feb 2017 20:20:04 -0800

Erick Erickson created SOLR-10186:
-------------------------------------

             Summary: Allow CharTokenizer-derived tokenizers and 
KeywordTokenizer to configure the max token length
                 Key: SOLR-10186
                 URL: https://issues.apache.org/jira/browse/SOLR-10186
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Erick Erickson
            Priority: Minor



Is there a good reason that we hard-code a 256 character limit for the 
CharTokenizer? In order to change this limit it requires that people copy/paste 
the incrementToken into some new class since incrementToken is final.

KeywordTokenizer can easily change the default (which is also 256 bytes), but 
to do so requires code rather than being able to configure it in the schema.

For KeywordTokenizer, this is Solr-only. For the CharTokenizer classes 
(WhitespaceTokenizer, UnicodeWhitespaceTokenizer and LetterTokenizer) 
(Factories) it would take adding a c'tor to the base class in Lucene and using 
it in the factory.

Any objections?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-10186) Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the max token length

Reply via email to