Hi Arezki, I will use Unix style notation for files, since that is more comfortable for me, but I think the basic idea remains the same. For any file (let's call it input.txt) you can find the Mutual Information of the bigrams in it via the following two steps....
count.pl --ngram 2 output.txt input.txt statistic.pl tmi output-mi.txt output.txt OR statistic.pl pmi output-mi.txt output.txt Note that there are two ways of finding Mutual Information, one we refer to as true mutual information (tmi) and the other as pointwise mutual information (pmi). For finding collocations, pmi tends to be what people are referring to when they talk about mutual information, but we also provide tmi in the event someone means the more classical definition of mutual information from information theory. The differences between the two are described in the perldoc for each measure.... http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/2D/MI/pmi.pm http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/2D/MI/tmi.pm There are also some handy options with count.pl that let you eliminate stop words and things like that, but the above is the most basic way to run things, and that's probably a good starting point. The output from statistic.pl comes in sorted order. If you want to get a list for each of your files, just run them separately. If you want one big list for all of the files, you can specify as many input files on the command line as you like, as in... count.pl --ngram 2 output.txt input1.txt input2.txt input3.txt Then you could run statistic.pl as described above... I hope this all helps. Let us know if further questions arise. Good luck! Ted On Fri, Aug 29, 2008 at 10:15 AM, arezki20002002 <[EMAIL PROTECTED]> wrote: > HI Ted; > > I have a collection of text document "coll.txt" > wich contain : > D:\c.txt > D:\e.txt > D:\d.txt > D:\a.txt > D:\f.txt > D:\g.txt > D:\h.txt > D:\j.txt > How can I applied the MI mesure to extract the bigram in this > collection and puting them in séparat file with their decresing scores. > > Best Regards > Arezki > > -- Ted Pedersen http://www.d.umn.edu/~tpederse