[ 
https://issues.apache.org/jira/browse/LUCENE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921162#comment-13921162
 ] 

Robert Muir commented on LUCENE-5490:
-------------------------------------

Also MAX_TERM_LENGTH is in utf-8 bytes, but this count is in utf-16 code units. 
So I think MAX_TERM_LENGTH is not a great default.

MAX_TERM_LENGTH/3 would be better? This way if you use LengthFilter out of box 
because you tried to index a video file or something (and this is likely with 
java's defaults to contain many 3-byter 0xFFFD's), you wont ever hit the 
IndexWriter limit.

> make LengthFilterFactory's min/max args have sensible defaults
> --------------------------------------------------------------
>
>                 Key: LUCENE-5490
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5490
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Hoss Man
>            Priority: Minor
>
> LengthFilterFactory's min/max args are currently required, but it seems like 
> we could give them sensible defaults and make them optional...
> min = 0
> max = IndexWriter.MAX_TERM_LENGTH



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to