I'm trying to run count.pl for a directory of unicode documents (a sample 
document has been attached) using Perl 5 (v5.18.2). The output is a list of 
digits and punctuations without any unicode word:
 2732
 .<>1589
 :<>626
 2<>19
 !<>17
 10<>16
 4<>14
 13<>13
 12<>13
 20<>12
 9<>11
 15<>11
 3<>10
 5<>10
 Is it possible to ask count.pl to tokenize the input file just by space?

 There is --token option which maybe useful. But I don't how to use it.

Reply via email to