You'd have to modify the JFlex grammar. I'd suggest adding in a
generic "protected words" approach whereby you can pass in a list of
protected words.
This would be a nice patch/improvement.
-Grant
On Jun 3, 2009, at 4:07 AM, ami dudu wrote:
Hi, I'm using a StandardTokenizer which do great job for me but i
need to
enhance it somehow to consider words like "c++" "c#", ".net" as is
and not
tokenized it into "c" or "net".
I know that there are other tokenizers such as KeywordTokenizer and
WhitespaceTokenizer but they do not include the StandardTokenizer
logic.
Any ideas on what is the best way to add this enhancement?
Thanks,
Amid
--
View this message in context:
http://www.nabble.com/Enhance-StandardTokenizer-to-support-words-which-will-not-be-tokenized-tp23849495p23849495.html
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org