Hi guys, this is my first time posting on the Lucene list, so hello everyone.
I really like the way that the StandardTokenizer works, however I'd like for it to not split tokens on / (forward slash). I've been looking at http://unicode.org/reports/tr29/#Default_Word_Boundaries to try to understand the rules, but I'm either misunderstanding or missing something. If I understand correctly, the symbols in MidLetter keep it from splitting a token as long as there's alpha chars on either side. I tried adding the forward slash to the MidLetter and MidLetterSupp rules (tried different combinations), but it still seems like it's splitting on it. Does anyone have any tips or ideas? Thanks Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org