Hello,
I think the token definition list has some problem that causes the
ParseException if a term starts with any not English character.
Joanne's solution helps in case of three other chars but do not helps
for other.
A TERM is definied as:
<TERM: <_IDENTIFIER_CHAR>
( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "*",
"?", "~", "{", "}", "[", "]" ] )* >
That means a terms begin with an IDENTIFIER_CHAR and has other chars.
I think IDENTIFIER_CHAR doesn't need to be the first char so my
proposal is:
<TERM: ( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "*", "?",
"~", "{", "}", "[", "]" ] )+ >
On the other hand IDENTIFIER, ALPHA_CHAR, ALPHANUM_CHAR tokens are
definied but are not used.
peter
ps: I don't understand the definition of WILD_TERM. It states that a
wild term must end with identifier_char, so cannot end with *. Is it the
right definition?
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]
> Sent: Friday, October 26, 2001 6:42 PM
> To: Doug Cutting; 'Brian Goetz'; [EMAIL PROTECTED]
> Subject: Re(2): Re: [Lucene-dev] Katakana characters in
> queries (a bug?)
>
>
> I'm not sure if this is the answer you are lookin for - but I
> overcome a
> similar problem for Finnish characters by modifying the
> queryparser.jj file to
> contain the following lines :
>
> /* ***************** */
> /* Token Definitions */
> /* ***************** */
>
> <*> TOKEN : {
> <#_ALPHA_CHAR: ["a"-"z", "A"-"Z", "�", "�", "�", "�", "�", "�"] >
> | <#_NUM_CHAR: ["0"-"9"] >
> | <#_ALPHANUM_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "�", "�",
> "�", "�", "�", "�" ]
> >
> | <#_IDENTIFIER_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "_", "�",
> "�", "�", "�", "�",
> "�" ] >
> | <#_IDENTIFIER: <_ALPHA_CHAR> (<_IDENTIFIER_CHAR>)* >
> | <#_NEWLINE: ( "\r\n" | "\r" | "\n" ) >
> | <#_WHITESPACE: ( " " | "\t" ) >
> | <#_QCHAR: ( "\\" (<_NEWLINE> | ~["a"-"z", "A"-"Z",
> "0"-"9", "�", "�",
> "�", "�", "�", "�"] ) ) >
> | <#_RESTOFLINE: (~["\r", "\n"])* >
> }
>
> <DEFAULT> TOKEN : {
> <AND: ("AND" | "&&" | "and") >
> | <OR: ("OR" | "||" | "or") >
> | <NOT: ("NOT" | "!" | "not") >
> | <PLUS: "+" >
> | <MINUS: "-" >
> | <LPAREN: "(" >
> | <RPAREN: ")" >
> | <COLON: ":" >
> | <CARAT: "^" >
> | <STAR: "*" >
> | <QUOTED: "\"" (~["\""])+ "\"">
> | <NUMBER: (<_NUM_CHAR>)+ "." (<_NUM_CHAR>)+ >
> | <TERM: <_IDENTIFIER_CHAR>
> ( ~["\"", " ", "\t", "(", ")", ":", "&", "|",
> "^", "*" ] )* >
> }
>
> <DEFAULT> SKIP : {
> <<_WHITESPACE>>
> }
>
> <DEFAULT> TOKEN : {
> <ALL: (~[]) >
> }
>
>
>
>
>
> Doug Cutting (22/10/2001 16:39):
> >Brian,
> >
> >Do you know what's going on here? I have not yet had time
> to look at this.
> >If you don't have time, and no one else volunteers, then I
> will look into
> >it. I would like fix this for the 1.2 final release, if the
> change required
> >is not major.
> >
> >Doug
> >
> >> -----Original Message-----
> >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> >> Sent: Monday, October 22, 2001 8:56 AM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: Re: [Lucene-dev] Katakana characters in
> queries (a bug?)
> >>
> >>
> >>
> >>
> >> Hi,
> >>
> >> yes, I can confirm this bug. I have the same problem
> >> with query terms starting with german umlauts like '�', '�'
> >> and '�':
> >>
> >> Exception occurred during event dispatching:
> >> org.apache.lucene.queryParser.TokenMgrError: Lexical error at
> >> line 1, column 1.
> >> Encountered: "\u00f6" (246), after : ""
> >> at
> >> org.apache.lucene.queryParser.QueryParserTokenManager.getNextT
> >> oken(Unknown
> >> Source)
> >> at
> >> org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source)
> >> at
> >> org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown Source)
> >> at
> >> org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
> >> at
> >> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
> >> at
> >> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
> >> ...
> >>
> >> The problem occurres in lucene 1.2 RC1 and RC2.
> >>
> >> Regards,
> >> Ralf Zimmermann
> >>
> >>
>
> --------------------------------------------------------------
> ----------
>
> Joanne Sproston | Software Engineer
> Teamware Group
> [EMAIL PROTECTED]
> phone: +44 (0)1782 794879 fax: +44 (0)1782 776667
>
> intra / extra / Internet solutions at www.teamware.com
>