DO NOT REPLY [Bug 6091] - QueryParser not recognizing asterisk with UTF-8 index

bugzilla Sat, 27 Mar 2004 06:19:55 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=6091>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=6091

QueryParser not recognizing asterisk with UTF-8 index

[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From [EMAIL PROTECTED]  2004-03-27 14:20 -------
This is not a Lucene bug. Lucene takes a string so the caller is responsible 
that the string has been correctly decoded. What happens here is this: 
 
text.getBytes("UTF-8") returns the String as an array of bytes (UTF-8). Using 
this as the input for new String() will interpret this array as a byte 
sequence in the platform's default charset (usually iso-8859-1 on Linux). Thus 
the string is "broken"/misinterpreted. As Lucene has analyzers it relies on 
strings which have not been misinterpreted.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 6091] - QueryParser not recognizing asterisk with UTF-8 index

Reply via email to