[
https://issues.apache.org/jira/browse/LUCENE-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13512077#comment-13512077
]
Steven Rowe commented on LUCENE-4587:
-------------------------------------
+1, nice random test!
Small suggestion:
{{TestWordBreakSpellChecker.goodTestString()}} invalidates candidate combined
terms with whitespace, but there are other whitespace chars than those you
handle specifically. This would probably be faster and more complete:
{code:java}
private static final Pattern WHITESPACE_PATTERN = Pattern.compile("\\s");
private boolean goodTestString(String s) {
return s.codePointCount(0, s.length()) >= 2 && !
WHITESPACE_PATTERN.matcher(s).find();
}
{code}
> WordBreakSpellChecker treats bytes as chars
> -------------------------------------------
>
> Key: LUCENE-4587
> URL: https://issues.apache.org/jira/browse/LUCENE-4587
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/spellchecker
> Affects Versions: 4.0
> Reporter: Andreas Hubold
> Assignee: James Dyer
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4587.patch
>
>
> Originally opened as SOLR-4115.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]