Fortunately there is no need to write code! You can just use the --newline option with count.pl. This will prevent ngrams from crossing line boundaries!!
ted@charango:~$ cat test.txt my dog is nice i lkie my dog ted@charango:~$ count.pl test.cnt test.txt ted@charango:~$ cat test.cnt 7 my<>dog<>2 2 2 is<>nice<>1 1 1 nice<>i<>1 1 1 lkie<>my<>1 1 1 dog<>is<>1 1 1 i<>lkie<>1 1 1 ted@charango:~$ count.pl test1.cnt test.txt --newline ted@charango:~$ cat test1.cnt 6 my<>dog<>2 2 2 is<>nice<>1 1 1 lkie<>my<>1 1 1 dog<>is<>1 1 1 i<>lkie<>1 1 1 Notice that the bigram "nice i" is excluded when using --newline. I hope this helps! Good luck, Ted