DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=6091>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=6091 QueryParser not recognizing asterisk with UTF-8 index [EMAIL PROTECTED] changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Additional Comments From [EMAIL PROTECTED] 2004-03-27 14:20 ------- This is not a Lucene bug. Lucene takes a string so the caller is responsible that the string has been correctly decoded. What happens here is this: text.getBytes("UTF-8") returns the String as an array of bytes (UTF-8). Using this as the input for new String() will interpret this array as a byte sequence in the platform's default charset (usually iso-8859-1 on Linux). Thus the string is "broken"/misinterpreted. As Lucene has analyzers it relies on strings which have not been misinterpreted. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]