On Wed, Mar 25, 2009 at 6:50 AM, arezki20002002 <arezki20002...@yahoo.fr> wrote: > Hello, > once the file generated by statistic.pl > how can I know a bigram appears in this file? > thank you > Arezki
Hi Arezki, I tend to use the grep command to search through my statistics.pl output...(when I'm looking for a specific ngram). For example, I processed the biography "The Fabulous Life of Diego Rivera" as follows.... count.pl fab.out fabulous-life-of-diego-rivera.txt statistic.pl ll.pm fab-ll.out fab.out Then I decided I wanted to find out if "Tina Modotti" occurred in that book... marimba(22): grep "Tina<>Modotti" fab-ll.out Tina<>Modotti<>146 231.4471 13 26 14 This tells me that she did (13 times) and that this was the 146th ranked bigram (according to log-likelihood). Tina occurred 26 times (as the first word of a bigram) and Modotti occurs 14 times (as the second word of a bigram). I also just searched for Modotti.... marimba(23): grep "Modotti" fab-ll.out Tina<>Modotti<>146 231.4471 13 26 14 Modotti<>.<>1624 39.2108 9 14 7804 Modotti<>rejected<>6575 11.4513 1 14 17 Modotti<>served<>6641 11.3337 1 14 18 than<>Modotti<>11839 5.9592 1 262 14 Modotti<>was<>16621 2.5137 1 14 1611 Modotti<>and<>19152 0.8857 1 14 4451 Modotti<>,<>21349 0.0072 1 14 14352 Among other things, here I can see that Modotti is the second word of two different bigrams (Tina Modotti, 13 times as we saw above, and then as "than Modotti" 1 time, allowing us to confirm the total of 14 bigrams where Modotti is the second word...). Fishing around like this can be quite fun. You could also use egrep to specify regular expression patterns to search for (rather than just strings), but I find grep to be a nice starting point. I hope this is helpful! Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse