Namgyu Kim created LUCENE-8698:
----------------------------------
Summary: Fix replaceIgnoreCase method bug in EscapeQuerySyntaxImpl
Key: LUCENE-8698
URL: https://issues.apache.org/jira/browse/LUCENE-8698
Project: Lucene - Core
Issue Type: Bug
Components: core/queryparser
Reporter: Namgyu Kim
It is a patch of LUCENE-8572 issue from [~tonicava].
There is a serious bug in the replaceIgnoreCase method of the
EscapeQuerySyntaxImpl class.
This issue can affect QueryNode. (StringIndexOutOfBoundsException)
As I mentioned in comment of the issue, the String#toLowerCase() causes the
array to grow in size.
{code:java}
private static CharSequence replaceIgnoreCase(CharSequence string,
CharSequence sequence1, CharSequence escapeChar, Locale locale) {
// string = "İpone " [304, 112, 111, 110, 101, 32], size = 6
...
while (start < count) {
// Convert by toLowerCase as follows.
// string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
// firstIndex will be set 6.
if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
start)) == -1)
break;
boolean found = true;
...
if (found) {
// In this line, String.toString() will only have a range of 0 to 5.
// So here we get a StringIndexOutOfBoundsException.
result.append(string.toString().substring(copyStart, firstIndex));
...
} else {
start = firstIndex + 1;
}
}
...
}{code}
Maintaining the overall structure and fixing bug is very simple.
If we change to the following code, the method works fine.
{code:java}
// Line 135 ~ 136
// BEFORE
if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first, start))
== -1)
// AFTER
if ((firstIndex = string.toString().indexOf(first, start)) == -1)
{code}
But I wonder if this is the best way.
How do you think about using String#replace() instead?
{code:java}
// SAMPLE : escapeWhiteChar (escapeChar and escapeQuoted are same)
// BEFORE
private static final CharSequence escapeWhiteChar(CharSequence str,
Locale locale) {
...
for (int i = 0; i < escapableWhiteChars.length; i++) {
buffer = replaceIgnoreCase(buffer,
escapableWhiteChars[i].toLowerCase(locale),
"\\", locale);
}
...
}
// AFTER
private static final CharSequence escapeWhiteChar(CharSequence str,
Locale locale) {
...
for (int i = 0; i < escapableWhiteChars.length; i++) {
buffer = buffer.toString().replace(escapableWhiteChars[i], "\\" +
escapableWhiteChars[i]);
}
...
}
{code}
First, I upload the patch using String#replace().
If you give me some feedback, I will check it :D
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]