You'd have to modify the JFlex grammar. I'd suggest adding in a generic "protected words" approach whereby you can pass in a list of protected words.

This would be a nice patch/improvement.

-Grant

On Jun 3, 2009, at 4:07 AM, ami dudu wrote:


Hi, I'm using a StandardTokenizer which do great job for me but i need to enhance it somehow to consider words like "c++" "c#", ".net" as is and not
tokenized it into "c" or "net".
I know that there are other tokenizers such as KeywordTokenizer and
WhitespaceTokenizer but they do not include the StandardTokenizer logic.
Any ideas on what is the best way to add this enhancement?

Thanks,
Amid
--
View this message in context: 
http://www.nabble.com/Enhance-StandardTokenizer-to-support-words-which-will-not-be-tokenized-tp23849495p23849495.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to