[
https://issues.apache.org/jira/browse/LUCENE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1118.
----------------------------------------
Resolution: Fixed
> core analyzers should not produce tokens > N (100?) characters in length
> ------------------------------------------------------------------------
>
> Key: LUCENE-1118
> URL: https://issues.apache.org/jira/browse/LUCENE-1118
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-1118.patch
>
>
> Discussion that led to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
> I believe nearly any time a token > 100 characters in length is
> produced, it's a bug in the analysis that the user is not aware of.
> These long tokens cause all sorts of problems, downstream, so it's
> best to catch them early at the source.
> We can accomplish this by tacking on a LengthFilter onto the chains
> for StandardAnalyzer, SimpleAnalyzer, WhitespaceAnalyzer, etc.
> Should we do this in 2.3? I realize this is technically a break in
> backwards compatibility, however, I think it must be incredibly rare
> that this break would in fact break something real in the application?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]