Hi Nilesh,

Which version of Lucene are you using?  StandardTokenizer behavior changed in 
v3.1.

Steve

-----Original Message-----
From: Nilesh Vijaywargiay [mailto:nilesh.vi...@gmail.com] 
Sent: Tuesday, March 27, 2012 2:04 PM
To: java-user@lucene.apache.org
Subject: Lucene tokenization

I have a string 01a_b-_-c-d which is tokenized as 01a_b c d

and the string a_b-_-c_d which is tokenized as a b c d

why is there a difference when there is a digit at the beginning? I am using 
standard unstemmed tokenizer.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to