Hi everybody,

I have just found myself in the situation of having to subclass CharTokenizer with a class that tests against Character.isLetterOrDigit. I would use a LetterTokenizer, but it's important for me to allow numbers through, as the documents I'm indexing often have dates such as '2000' or '1945'.

Obviously it's only a few lines to do this, but I'm sure I'm not the first person to have had to do it. May I make the feature request that LetterTokenizer should have an 'AllowDigits' property?

Apologies if this has been discussed earlier. I googled for the relevant terms and found nothing.

Thanks,
Peter Pimley,
Semantico.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to