Paul Smith wrote:
michael higgins wrote:
Does somebody know about a program for giving us the number of occurrences of each word in a text file?
Well, this sounds like a good job for perl...
perl -e 'open F,$ARGV[0];for(<F>){for(split /\W/,$_){$t{lc $_}++}}for(sort keys %t){print $_, " => ", $t{$_}, "\n"}' forums.txt
Thanks to all who have tried to help me.
Michael: your perl program does not work very well with accented words. Please, apply it to the following text:
O primeiro-ministro iraquiano, Iyad Allaoui, deu uma �ltima hip�tese �s mil�cias de Moqtada al Sadr para deporem as armas e abandonarem a mesquita do Iman Ali. Em Nadjaf, o ru�do de tiros e explos�es regressaram.
Paul
Sorry, Paul. I didn't consider that, being practically monolingual. It's clunky, but FWIW:
perl -e 'open F,"<:encoding(iso-8859-9)",$ARGV[0];for(<F>){for(split /\W+/,$_){$t{lc $_}++}}for(map{$_->[0]}sort {$a->[1] <=> $b->[1]|| $a->[0] cmp $b->[0]}map{[$_,$t{$_}]} keys %t){print $_, " => ", $t{$_}, "\n"}' thenews.txt
...works for me like this:
a => 1 abandonarem => 1 al => 1 ali => 1 allaoui => 1 armas => 1 as => 1 deporem => 1 deu => 1 do => 1 em => 1 explos�es => 1 hip�tese => 1 iman => 1 iraquiano => 1 iyad => 1 mesquita => 1 mil�cias => 1 ministro => 1 moqtada => 1 nadjaf => 1 para => 1 primeiro => 1 regressaram => 1 ru�do => 1 sadr => 1 tiros => 1 uma => 1 �s => 1 �ltima => 1 de => 2 e => 2 o => 2
But, at this point it looks like a form of self-torture - not very useful as a '1-liner'. I don't know if this works on perl < v5.8 either.
Sorry to muddy the waters. I'll go back to lurking now, until I figure out how get a mandrake install on my beige mac g3. It seems perhaps I don't drink enough beer. '-)
-- mike higgins
____________________________________________________ Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com Join the Club : http://www.mandrakeclub.com ____________________________________________________
