[jira] [Comment Edited] (LUCENE-7762) Add EnglishAnalyzer.setMaxTokenLength

Uwe Schindler (JIRA) Fri, 31 Mar 2017 04:09:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950690#comment-15950690
 ]


Uwe Schindler edited comment on LUCENE-7762 at 3/31/17 11:09 AM:
-----------------------------------------------------------------

I agree with Robert. When implementing CustomAnalyzer my "larger plan" was 
already to remove all hardcoded Analyzer "examples" from source code. This 
would also reduce the size of the analysis jars and number of classes confusing 
users. My idea would be to just have the current Analyzers as static final 
"constants" in some "utility" class, one for each language (e.g., lazy 
initialized in {{Analyzers.get(Locale.ENGLISH)}} with a Java 8 function lambda 
that returns a CustomAnalyzer, {{Locale}} was just an idea, could also be an 
enum).

Users who want analyzers with custom stopwords and so on, can use the builder 
pattern of CustomAnalyzer. Then it looks like configuring an analyzer in Solr 
or Elasticsearch.


was (Author: thetaphi):
I agree with Robert. When implementing CustomAnalyzer my "larger plan" was 
already to remove all hardcoded Analyzer "examples" from source code. This 
would also reduce the size of the analysis jars and number of classes confusing 
users. My idea would be to just have the current Analyzers as static final 
"constants" in some "utility" class, one for each language (e.g., lazy 
initialized in {{Analyzers.get(Locale.ENGLISH)}} with a Java 8 function lambda 
or something similar, {{Locale}} was just an idea, could also be an enum).

Users who want analyzers with custom stopwords and so on, can use the builder 
pattern of CustomAnalyzer. Then it looks like configuring an analyzer in Solr 
or Elasticsearch.

> Add EnglishAnalyzer.setMaxTokenLength
> -------------------------------------
>
>                 Key: LUCENE-7762
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7762
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.6
>
>
> I think EnglishAnalyzer should also let you change the default (255) max 
> token length of the StandardTokenizer its invoking.
> I will also fold the javadoc fixes from LUCENE-7760 here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-7762) Add EnglishAnalyzer.setMaxTokenLength

Reply via email to