[jira] Resolved: (LUCENE-1118) core analyzers should not produce tokens > N (100?) characters in length

Michael McCandless (JIRA) Sun, 06 Jan 2008 11:25:05 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless resolved LUCENE-1118.
----------------------------------------

    Resolution: Fixed

> core analyzers should not produce tokens > N (100?) characters in length
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1118
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1118
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1118.patch
>
>
> Discussion that led to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103
> I believe nearly any time a token > 100 characters in length is
> produced, it's a bug in the analysis that the user is not aware of.
> These long tokens cause all sorts of problems, downstream, so it's
> best to catch them early at the source.
> We can accomplish this by tacking on a LengthFilter onto the chains
> for StandardAnalyzer, SimpleAnalyzer, WhitespaceAnalyzer, etc.
> Should we do this in 2.3?  I realize this is technically a break in
> backwards compatibility, however, I think it must be incredibly rare
> that this break would in fact break something real in the application?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1118) core analyzers should not produce tokens > N (100?) characters in length

Reply via email to