I'm not sure if this is the answer you are lookin for - but I overcome a
similar problem for Finnish characters by modifying the queryparser.jj file to
contain the following lines :
/* ***************** */
/* Token Definitions */
/* ***************** */
<*> TOKEN : {
<#_ALPHA_CHAR: ["a"-"z", "A"-"Z", "�", "�", "�", "�", "�", "�"] >
| <#_NUM_CHAR: ["0"-"9"] >
| <#_ALPHANUM_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "�", "�", "�", "�", "�", "�" ]
>
| <#_IDENTIFIER_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "_", "�", "�", "�", "�", "�",
"�" ] >
| <#_IDENTIFIER: <_ALPHA_CHAR> (<_IDENTIFIER_CHAR>)* >
| <#_NEWLINE: ( "\r\n" | "\r" | "\n" ) >
| <#_WHITESPACE: ( " " | "\t" ) >
| <#_QCHAR: ( "\\" (<_NEWLINE> | ~["a"-"z", "A"-"Z", "0"-"9", "�", "�",
"�", "�", "�", "�"] ) ) >
| <#_RESTOFLINE: (~["\r", "\n"])* >
}
<DEFAULT> TOKEN : {
<AND: ("AND" | "&&" | "and") >
| <OR: ("OR" | "||" | "or") >
| <NOT: ("NOT" | "!" | "not") >
| <PLUS: "+" >
| <MINUS: "-" >
| <LPAREN: "(" >
| <RPAREN: ")" >
| <COLON: ":" >
| <CARAT: "^" >
| <STAR: "*" >
| <QUOTED: "\"" (~["\""])+ "\"">
| <NUMBER: (<_NUM_CHAR>)+ "." (<_NUM_CHAR>)+ >
| <TERM: <_IDENTIFIER_CHAR>
( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "*" ] )* >
}
<DEFAULT> SKIP : {
<<_WHITESPACE>>
}
<DEFAULT> TOKEN : {
<ALL: (~[]) >
}
Doug Cutting (22/10/2001 16:39):
>Brian,
>
>Do you know what's going on here? I have not yet had time to look at this.
>If you don't have time, and no one else volunteers, then I will look into
>it. I would like fix this for the 1.2 final release, if the change required
>is not major.
>
>Doug
>
>> -----Original Message-----
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>> Sent: Monday, October 22, 2001 8:56 AM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Re: [Lucene-dev] Katakana characters in queries (a bug?)
>>
>>
>>
>>
>> Hi,
>>
>> yes, I can confirm this bug. I have the same problem
>> with query terms starting with german umlauts like '�', '�'
>> and '�':
>>
>> Exception occurred during event dispatching:
>> org.apache.lucene.queryParser.TokenMgrError: Lexical error at
>> line 1, column 1.
>> Encountered: "\u00f6" (246), after : ""
>> at
>> org.apache.lucene.queryParser.QueryParserTokenManager.getNextT
>> oken(Unknown
>> Source)
>> at
>> org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source)
>> at
>> org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown Source)
>> at
>> org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
>> at
>> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
>> at
>> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
>> ...
>>
>> The problem occurres in lucene 1.2 RC1 and RC2.
>>
>> Regards,
>> Ralf Zimmermann
>>
>>
------------------------------------------------------------------------
Joanne Sproston | Software Engineer
Teamware Group
[EMAIL PROTECTED]
phone: +44 (0)1782 794879 fax: +44 (0)1782 776667
intra / extra / Internet solutions at www.teamware.com