I'm not sure if this is the answer you are lookin for - but I overcome a
similar problem for Finnish characters by modifying the queryparser.jj file to
contain the following lines :

/* ***************** */
/* Token Definitions */
/* ***************** */

<*> TOKEN : {
  <#_ALPHA_CHAR: ["a"-"z", "A"-"Z", "�", "�", "�", "�", "�", "�"] >
| <#_NUM_CHAR:   ["0"-"9"] >
| <#_ALPHANUM_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "�", "�", "�", "�", "�", "�" ]
>
| <#_IDENTIFIER_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "_", "�", "�", "�", "�", "�",
"�" ] >
| <#_IDENTIFIER: <_ALPHA_CHAR> (<_IDENTIFIER_CHAR>)* >
| <#_NEWLINE:    ( "\r\n" | "\r" | "\n" ) >
| <#_WHITESPACE: ( " " | "\t" ) >
| <#_QCHAR:      ( "\\" (<_NEWLINE> | ~["a"-"z", "A"-"Z", "0"-"9", "�", "�",
"�", "�", "�", "�"] ) ) >
| <#_RESTOFLINE: (~["\r", "\n"])* >
}

<DEFAULT> TOKEN : {
  <AND:       ("AND" | "&&" | "and") >
| <OR:        ("OR" | "||" | "or") >
| <NOT:       ("NOT" | "!" | "not") >
| <PLUS:      "+" >
| <MINUS:     "-" >
| <LPAREN:    "(" >
| <RPAREN:    ")" >
| <COLON:     ":" >
| <CARAT:     "^" >
| <STAR:      "*" >
| <QUOTED:     "\"" (~["\""])+ "\"">
| <NUMBER:    (<_NUM_CHAR>)+ "." (<_NUM_CHAR>)+ >
| <TERM:      <_IDENTIFIER_CHAR>
              ( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "*" ] )* >
}

<DEFAULT> SKIP : {
  <<_WHITESPACE>>
}

<DEFAULT> TOKEN : {
<ALL:       (~[]) >
}





Doug Cutting  (22/10/2001  16:39):
>Brian,
>
>Do you know what's going on here?  I have not yet had time to look at this.
>If you don't have time, and no one else volunteers, then I will look into
>it.  I would like fix this for the 1.2 final release, if the change required
>is not major.
>
>Doug
>
>> -----Original Message-----
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>> Sent: Monday, October 22, 2001 8:56 AM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Re: [Lucene-dev] Katakana characters in queries (a bug?)
>>
>>
>>
>>
>> Hi,
>>
>> yes, I can confirm this bug. I have the same problem
>> with query terms starting with german umlauts like '�', '�'
>> and '�':
>>
>> Exception occurred during event dispatching:
>> org.apache.lucene.queryParser.TokenMgrError: Lexical error at
>> line 1, column 1.
>> Encountered: "\u00f6" (246), after : ""
>>      at
>> org.apache.lucene.queryParser.QueryParserTokenManager.getNextT
>> oken(Unknown
>> Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
>>      ...
>>
>> The problem occurres in lucene 1.2 RC1 and RC2.
>>
>> Regards,
>> Ralf Zimmermann
>>
>>

------------------------------------------------------------------------

Joanne Sproston | Software Engineer
Teamware Group
[EMAIL PROTECTED]
phone: +44 (0)1782 794879  fax: +44 (0)1782  776667

intra / extra / Internet solutions at www.teamware.com

Reply via email to