Grate Thankes very much Dr. Ted
most of the tools I know do not make that even the big tool NLTK --- In ngram@yahoogroups.com, Ted Pedersen <tpederse@...> wrote: > > Fortunately there is no need to write code! You can just use the > --newline option with count.pl. This will prevent ngrams from crossing > line boundaries!! > > ted@charango:~$ cat test.txt > my dog is nice > i lkie my dog > > ted@charango:~$ count.pl test.cnt test.txt > > ted@charango:~$ cat test.cnt > 7 > my<>dog<>2 2 2 > is<>nice<>1 1 1 > nice<>i<>1 1 1 > lkie<>my<>1 1 1 > dog<>is<>1 1 1 > i<>lkie<>1 1 1 > ted@charango:~$ count.pl test1.cnt test.txt --newline > > ted@charango:~$ cat test1.cnt > 6 > my<>dog<>2 2 2 > is<>nice<>1 1 1 > lkie<>my<>1 1 1 > dog<>is<>1 1 1 > i<>lkie<>1 1 1 > > Notice that the bigram "nice i" is excluded when using --newline. > > I hope this helps! > > Good luck, > Ted >