[
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705249#comment-16705249
]
Namgyu Kim edited comment on LUCENE-8572 at 11/30/18 8:37 PM:
--------------------------------------------------------------
Hi, [~romseygeek], [~thetaphi].
I checked the issue and found that it could be a logical problem.
First, I think it's not a Locale problem, but a replace
algorithm(replaceIgnoreCase) itself.
When you see the escapeWhiteChar(), it calls the replaceIgnoreCase() internally.
(escapeTerm() -> escapeWhiteChar() -> replaceIgnoreCase())
{code:java}
private static CharSequence replaceIgnoreCase(CharSequence string,
CharSequence sequence1, CharSequence escapeChar, Locale locale) {
// string = "İpone " [304, 112, 111, 110, 101, 32], size = 6
...
while (start < count) {
// Convert by toLowerCase as follows.
// string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
// firstIndex will be set 6.
if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
start)) == -1)
break;
boolean found = true;
...
if (found) {
// In this line, String.toString() will only have a range of 0 to 5.
// So here we get a StringIndexOutOfBoundsException.
result.append(string.toString().substring(copyStart, firstIndex));
...
} else {
start = firstIndex + 1;
}
}
...
}
{code}
Solving this may not be a big problem.
But what do you think about using
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
Locale locale) {
...
for (int i = 0; i < escapableWhiteChars.length; i++) {
// Use String's replace method.
buffer = buffer.toString().replace(escapableWhiteChars[i], "\\");
}
return buffer;
}
{code}
instead of
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
Locale locale) {
...
for (int i = 0; i < escapableWhiteChars.length; i++) {
// Stay current method.
buffer = replaceIgnoreCase(buffer,
escapableWhiteChars[i].toLowerCase(locale), "\\", locale);
}
return buffer;
}
{code}
in the escapeWhiteChar method?
was (Author: danmuzi):
Hi, [~romseygeek], [~thetaphi].
I checked the issue and it could be a logical problem.
First, I think it's not a Locale problem, but a replace
algorithm(replaceIgnoreCase) itself.
When you see the escapeWhiteChar(), it calls the replaceIgnoreCase() internally.
(escapeTerm() -> escapeWhiteChar() -> replaceIgnoreCase())
{code:java}
private static CharSequence replaceIgnoreCase(CharSequence string,
CharSequence sequence1, CharSequence escapeChar, Locale locale) {
// string = "İpone " [304, 112, 111, 110, 101, 32], size = 6
...
while (start < count) {
// Convert by toLowerCase as follows.
// string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
// firstIndex will be set 6.
if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
start)) == -1)
break;
boolean found = true;
...
if (found) {
// In this line, String.toString() will only have a range of 0 to 5.
// So here we get a StringIndexOutOfBoundsException.
result.append(string.toString().substring(copyStart, firstIndex));
...
} else {
start = firstIndex + 1;
}
}
...
}
{code}
Solving this may not be a big problem.
But what do you think about using
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
Locale locale) {
...
for (int i = 0; i < escapableWhiteChars.length; i++) {
// Use String's replace method.
buffer = buffer.toString().replace(escapableWhiteChars[i], "\\");
}
return buffer;
}
{code}
instead of
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
Locale locale) {
...
for (int i = 0; i < escapableWhiteChars.length; i++) {
// Stay current method.
buffer = replaceIgnoreCase(buffer,
escapableWhiteChars[i].toLowerCase(locale), "\\", locale);
}
return buffer;
}
{code}
in the escapeWhiteChar method?
> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> --------------------------------------------------------------------
>
> Key: LUCENE-8572
> URL: https://issues.apache.org/jira/browse/LUCENE-8572
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/queryparser
> Affects Versions: 6.3
> Reporter: Octavian Mocanu
> Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>
> when escaping strings containing extended unicode chars, and with a locale
> distinct from that of the character set the string uses, the process fails,
> with a "java.lang.StringIndexOutOfBoundsException".
>
> The reason is that the comparison is done by previously converting all of the
> characters of the string to lower case chars, and by doing this, the original
> string size isn't anymore the same, but less, as of the transformed one, so
> that executing
>
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when
> treating the escape chars, since by avoiding it, the error may be avoided.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]