Hi all,
Has anyone had any luck using StandardTokenizer for Unicode behind Latin-1 set? I have tried to use it for Cyrillic (U+0400..U+04FF) and it looks like the characters don't get through, despite the fact that Cyrillic IS included in StandardTokenizer.jj (i.e. is a subset of Unicode symbols, used to describe the Letter token). If I try to specify UNICODE_INPUT = true in StandardTokenizer.jj (and disable USER_CHAR_STREAM = true), it starts working perfectly. So does that mean I have to have my own version of StandardTokenizer to make Unicode input possible? Boris Okner -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
