Paul Smith wrote:

michael higgins wrote:

Does somebody know about a program for giving us the number of occurrences of each word in a text file?


Well, this sounds like a good job for perl...

perl -e 'open F,$ARGV[0];for(<F>){for(split /\W/,$_){$t{lc $_}++}}for(sort keys %t){print $_, " => ", $t{$_}, "\n"}' forums.txt


Thanks to all who have tried to help me.

Michael: your perl program does not work very well with accented words. Please, apply it to the following text:

O primeiro-ministro iraquiano, Iyad Allaoui, deu uma �ltima hip�tese �s mil�cias de Moqtada al Sadr para deporem as armas e abandonarem a mesquita do Iman Ali. Em Nadjaf, o ru�do de tiros e explos�es regressaram.

Paul


Sorry, Paul. I didn't consider that, being practically monolingual. It's clunky, but FWIW:


perl -e 'open F,"<:encoding(iso-8859-9)",$ARGV[0];for(<F>){for(split /\W+/,$_){$t{lc $_}++}}for(map{$_->[0]}sort {$a->[1] <=> $b->[1]|| $a->[0] cmp $b->[0]}map{[$_,$t{$_}]} keys %t){print $_, " => ", $t{$_}, "\n"}' thenews.txt

...works for me like this:

a => 1
abandonarem => 1
al => 1
ali => 1
allaoui => 1
armas => 1
as => 1
deporem => 1
deu => 1
do => 1
em => 1
explos�es => 1
hip�tese => 1
iman => 1
iraquiano => 1
iyad => 1
mesquita => 1
mil�cias => 1
ministro => 1
moqtada => 1
nadjaf => 1
para => 1
primeiro => 1
regressaram => 1
ru�do => 1
sadr => 1
tiros => 1
uma => 1
�s => 1
�ltima => 1
de => 2
e => 2
o => 2

But, at this point it looks like a form of self-torture - not very useful as a '1-liner'. I don't know if this works on perl < v5.8 either.

Sorry to muddy the waters. I'll go back to lurking now, until I figure out how get a mandrake install on my beige mac g3. It seems perhaps I don't drink enough beer. '-)

-- mike higgins


____________________________________________________
Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com
Join the Club : http://www.mandrakeclub.com
____________________________________________________

Reply via email to