RE: Allow non letter characters in tokens

Rupinder Singh Mazara Thu, 29 Jul 2004 04:10:56 -0700

Hi all

  my dataset also seems to have a similar problem the chemical name
alpha-androstane-3, and several others exsists
  in the given text, can anyone point out what is the best stratergy to
employ so as to index
  words containing - _ +  to be indexed as they are and not face being
mutilated ?



  currently on my indexes the StandardAnalyzer and QueryParser  break up
alpha-androstane-3
  into TEXT:alpha -TEXT:androstane -TEXT:3 , where TEXT is the Field to be
searched

  If a enclose alpha-androstane-3 as a phrase "alpha-androstane-3" then the
QueryParser
breaks is down to ABSTRACT:"alpha androstane-3"  , some how the first "-"
disapears  ?


 regards

 Rupinder

>-----Original Message-----
>From: Marcus Rau [mailto:[EMAIL PROTECTED]
>Sent: 29 July 2004 11:48
>To: [EMAIL PROTECTED]
>Subject: Allow non letter characters in tokens
>
>
>Hi there,
>
>my question is a pretty short one!
>
>How can I prevent Lucene from cutting out special characters (i.e. the
>"_") during tokenization of a text? It's quite essential for me to have
>some non letter chars in my index.
>
>Regards
>Marcus
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Allow non letter characters in tokens

Reply via email to