Fortunately there is no need to write code! You can just use the
--newline option with count.pl. This will prevent ngrams from crossing
line boundaries!!

ted@charango:~$ cat test.txt
my dog is nice
i lkie my dog

ted@charango:~$ count.pl test.cnt test.txt

ted@charango:~$ cat test.cnt
7
my<>dog<>2 2 2
is<>nice<>1 1 1
nice<>i<>1 1 1
lkie<>my<>1 1 1
dog<>is<>1 1 1
i<>lkie<>1 1 1
ted@charango:~$ count.pl test1.cnt test.txt --newline

ted@charango:~$ cat test1.cnt
6
my<>dog<>2 2 2
is<>nice<>1 1 1
lkie<>my<>1 1 1
dog<>is<>1 1 1
i<>lkie<>1 1 1

Notice that the bigram "nice i" is excluded when using --newline.

I hope this helps!

Good luck,
Ted

Reply via email to