Hello,

>   my dataset also seems to have a similar problem the chemical name
> alpha-androstane-3, and several others exsists
>   in the given text, can anyone point out what is the best stratergy
> to
> employ so as to index
>   words containing - _ +  to be indexed as they are and not face
> being
> mutilated ?

You have to use or write an Analyzer that doesn't tokenize on
non-letter or other characters.

>   currently on my indexes the StandardAnalyzer and QueryParser  break
> up
> alpha-androstane-3
>   into TEXT:alpha -TEXT:androstane -TEXT:3 , where TEXT is the Field
> to be
> searched

Hm, I thought we've fixed QueryParser not to do this.  Are you using
Lucene 1.4?

Otis

>   If a enclose alpha-androstane-3 as a phrase "alpha-androstane-3"
> then the
> QueryParser
> breaks is down to ABSTRACT:"alpha androstane-3"  , some how the first
> "-"
> disapears  ?
> 
> 
>  regards
> 
>  Rupinder
> 
> >-----Original Message-----
> >From: Marcus Rau [mailto:[EMAIL PROTECTED]
> >Sent: 29 July 2004 11:48
> >To: [EMAIL PROTECTED]
> >Subject: Allow non letter characters in tokens
> >
> >
> >Hi there,
> >
> >my question is a pretty short one!
> >
> >How can I prevent Lucene from cutting out special characters (i.e.
> the
> >"_") during tokenization of a text? It's quite essential for me to
> have
> >some non letter chars in my index.
> >
> >Regards
> >Marcus
> >
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to