[
https://issues.apache.org/jira/browse/LUCENE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974932#comment-13974932
]
Mike Sokolov commented on LUCENE-5620:
--------------------------------------
bq. doing this selectively (only adding additional terms in some cases) is
pretty complicated if you dont want to screw over length normalization
Interesting point, although it's debatable how strong the effect is - I guess
it depends on how many tokens are affected by the filter chain, and whether
this varies in any significant way from document to document: I tend to think
that the number of capitalized words, say, will be similar from document to
document, but of course there will be exceptions in different data sets.
It makes me wonder whether length normalization shouldn't use max position
instead of term count when it is available.
> LowerCaseFilter.preserveOriginal
> --------------------------------
>
> Key: LUCENE-5620
> URL: https://issues.apache.org/jira/browse/LUCENE-5620
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Mike Sokolov
> Attachments: LUCENE-5620.patch
>
>
> Following closely the model of LUCENE-5437 (which worked on
> ASCIIFoldingFilter), this patch adds the ability to preserve the original
> token to LowerCaseFilter. This is useful if you want an all-lowercase search
> term to match without regard to case, while search terms with uppercase
> letters match in a case-sensitive manner.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]