[ 
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705249#comment-16705249
 ] 

Namgyu Kim edited comment on LUCENE-8572 at 11/30/18 8:37 PM:
--------------------------------------------------------------

Hi, [~romseygeek], [~thetaphi].

I checked the issue and found that it could be a logical problem.

First, I think it's not a Locale problem, but a replace 
algorithm(replaceIgnoreCase) itself.

When you see the escapeWhiteChar(), it calls the replaceIgnoreCase() internally.
 (escapeTerm() -> escapeWhiteChar() -> replaceIgnoreCase())

 
{code:java}
private static CharSequence replaceIgnoreCase(CharSequence string,
    CharSequence sequence1, CharSequence escapeChar, Locale locale) {
  // string = "İpone " [304, 112, 111, 110, 101, 32],  size = 6
  ...
  while (start < count) {
    // Convert by toLowerCase as follows.
    // string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
    // firstIndex will be set 6.
    if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
        start)) == -1)
      break;
    boolean found = true;
    ...
    if (found) {
      // In this line, String.toString() will only have a range of 0 to 5.
      // So here we get a StringIndexOutOfBoundsException.
      result.append(string.toString().substring(copyStart, firstIndex));
      ...
    } else {
      start = firstIndex + 1;
    }
  }
  ...
}
{code}
 

Solving this may not be a big problem.

But what do you think about using
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
      Locale locale) {
    ...

    for (int i = 0; i < escapableWhiteChars.length; i++) {
      // Use String's replace method.
      buffer = buffer.toString().replace(escapableWhiteChars[i], "\\");
    }
    return buffer;
  }
{code}
instead of
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
      Locale locale) {
    ...

    for (int i = 0; i < escapableWhiteChars.length; i++) {
      // Stay current method.
      buffer = replaceIgnoreCase(buffer, 
escapableWhiteChars[i].toLowerCase(locale), "\\", locale);
    }
    return buffer;
  }
{code}
in the escapeWhiteChar method?

 


was (Author: danmuzi):
Hi, [~romseygeek], [~thetaphi].

I checked the issue and it could be a logical problem.

First, I think it's not a Locale problem, but a replace 
algorithm(replaceIgnoreCase) itself.

When you see the escapeWhiteChar(), it calls the replaceIgnoreCase() internally.
(escapeTerm() -> escapeWhiteChar() -> replaceIgnoreCase())

 
{code:java}
private static CharSequence replaceIgnoreCase(CharSequence string,
    CharSequence sequence1, CharSequence escapeChar, Locale locale) {
  // string = "İpone " [304, 112, 111, 110, 101, 32],  size = 6
  ...
  while (start < count) {
    // Convert by toLowerCase as follows.
    // string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
    // firstIndex will be set 6.
    if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
        start)) == -1)
      break;
    boolean found = true;
    ...
    if (found) {
      // In this line, String.toString() will only have a range of 0 to 5.
      // So here we get a StringIndexOutOfBoundsException.
      result.append(string.toString().substring(copyStart, firstIndex));
      ...
    } else {
      start = firstIndex + 1;
    }
  }
  ...
}
{code}
 

Solving this may not be a big problem.


But what do you think about using
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
      Locale locale) {
    ...

    for (int i = 0; i < escapableWhiteChars.length; i++) {
      // Use String's replace method.
      buffer = buffer.toString().replace(escapableWhiteChars[i], "\\");
    }
    return buffer;
  }
{code}
instead of
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
      Locale locale) {
    ...

    for (int i = 0; i < escapableWhiteChars.length; i++) {
      // Stay current method.
      buffer = replaceIgnoreCase(buffer, 
escapableWhiteChars[i].toLowerCase(locale), "\\", locale);
    }
    return buffer;
  }
{code}
in the escapeWhiteChar method?

 

> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> --------------------------------------------------------------------
>
>                 Key: LUCENE-8572
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8572
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 6.3
>            Reporter: Octavian Mocanu
>            Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>  
> when escaping strings containing extended unicode chars, and with a locale 
> distinct from that of the character set the string uses, the process fails, 
> with a "java.lang.StringIndexOutOfBoundsException".
>  
> The reason is that the comparison is done by previously converting all of the 
> characters of the string to lower case chars, and by doing this, the original 
> string size isn't anymore the same, but less, as of the transformed one, so 
> that executing
>  
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when 
> treating the escape chars, since by avoiding it, the error may be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to