donnerpeter commented on a change in pull request #2376: URL: https://github.com/apache/lucene-solr/pull/2376#discussion_r576834208
########## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ########## @@ -1571,14 +1549,28 @@ boolean hasFlag(int entryId, char flag) { return flagLookup.hasFlag(entryId, flag); } - CharSequence cleanInput(CharSequence input, StringBuilder reuse) { - return cleanInput(input, input.length(), reuse); + boolean mayNeedInputCleaning() { + return ignoreCase || ignore != null || iconv != null; + } + + boolean needsInputCleaning(CharSequence input) { + if (mayNeedInputCleaning()) { + for (int i = 0; i < input.length(); i++) { + char ch = input.charAt(i); + if (ignore != null && Arrays.binarySearch(ignore, ch) >= 0 + || ignoreCase && caseFold(ch) != ch + || iconv != null && iconv.mightReplaceChar(ch)) { Review comment: Ideally `cleanInput` would check all this and just return its input if there's nothing to change, but that's not terribly trivial if we want to avoid allocations with its `StringBuilder reuse` parameter. Maybe later. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org