Tom Allison schreef: > I want to break up every email based on a token defined as > /(\w\w\w+)/g; This will give me every "word" of three or more letters.
Alternative: /(\w{3,})/g > But when I'm getting mail that is in UTF-8 format this doesn't work > that way I want it to as I can't see an umlat (or similar) as > matching a '\w'. First read perlunitut: http://juerd.nl/perlunitut.html (which of course has a SEE ALSO section at the end) More fun with \w etc.: http://www.xs4all.nl/~rvtol/perl/unicount.pl http://www.xs4all.nl/~rvtol/perl/unicount-WL.pl -- Affijn, Ruud "Gewoon is een tijger." -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/