Hi everybody,
I have just found myself in the situation of having to subclass CharTokenizer with a class that tests against Character.isLetterOrDigit. I would use a LetterTokenizer, but it's important for me to allow numbers through, as the documents I'm indexing often have dates such as '2000' or '1945'.
Obviously it's only a few lines to do this, but I'm sure I'm not the first person to have had to do it. May I make the feature request that LetterTokenizer should have an 'AllowDigits' property?
Apologies if this has been discussed earlier. I googled for the relevant terms and found nothing.
Thanks, Peter Pimley, Semantico.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]