[
https://issues.apache.org/jira/browse/LUCENE-7705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884739#comment-15884739
]
Amrit Sarkar commented on LUCENE-7705:
--------------------------------------
Erick,
I wrote the test-cases and it is a problem, but removing "maxTokenLen" from
original arguments which initialize LowerCaseFilterFactory makes sense, and it
is not hack. We have to remove the argument for the FilterFactory init
somewhere and it will be better if we do where we are making the call. I am not
inclined towards removing this at FilterFactory init or AbstractAnalysisFactory
func call. So we are left with two options, either we don't provide option for
maxTokenLen for LowerCaseTokenizer or we remove the extra argument as you have
done on getMultiTermComponent().
Let me know your thoughts.
> Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the
> max token length
> ---------------------------------------------------------------------------------------------
>
> Key: LUCENE-7705
> URL: https://issues.apache.org/jira/browse/LUCENE-7705
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Amrit Sarkar
> Assignee: Erick Erickson
> Priority: Minor
> Attachments: LUCENE-7705.patch, LUCENE-7705.patch, LUCENE-7705.patch
>
>
> SOLR-10186
> [~erickerickson]: Is there a good reason that we hard-code a 256 character
> limit for the CharTokenizer? In order to change this limit it requires that
> people copy/paste the incrementToken into some new class since incrementToken
> is final.
> KeywordTokenizer can easily change the default (which is also 256 bytes), but
> to do so requires code rather than being able to configure it in the schema.
> For KeywordTokenizer, this is Solr-only. For the CharTokenizer classes
> (WhitespaceTokenizer, UnicodeWhitespaceTokenizer and LetterTokenizer)
> (Factories) it would take adding a c'tor to the base class in Lucene and
> using it in the factory.
> Any objections?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]