Extending StandardTokenizer Jflex to not split on '/'

Diego Fernandez Fri, 14 Feb 2014 10:44:29 -0800

Hi guys, this is my first time posting on the Lucene list, so hello everyone.


I really like the way that the StandardTokenizer works, however I'd like for it 
to not split tokens on / (forward slash).  I've been looking at 
http://unicode.org/reports/tr29/#Default_Word_Boundaries to try to understand 
the rules, but I'm either misunderstanding or missing something.  If I 
understand correctly, the symbols in MidLetter keep it from splitting a token 
as long as there's alpha chars on either side.  I tried adding the forward 
slash to the MidLetter and MidLetterSupp rules (tried different combinations), 
but it still seems like it's splitting on it.

Does anyone have any tips or ideas?

Thanks

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Extending StandardTokenizer Jflex to not split on '/'

Reply via email to