Grate

Thankes very much Dr. Ted

most of the tools I know do not make that
even the big tool NLTK


--- In ngram@yahoogroups.com, Ted Pedersen <tpederse@...> wrote:
>
> Fortunately there is no need to write code! You can just use the
> --newline option with count.pl. This will prevent ngrams from crossing
> line boundaries!!
> 
> ted@charango:~$ cat test.txt
> my dog is nice
> i lkie my dog
> 
> ted@charango:~$ count.pl test.cnt test.txt
> 
> ted@charango:~$ cat test.cnt
> 7
> my<>dog<>2 2 2
> is<>nice<>1 1 1
> nice<>i<>1 1 1
> lkie<>my<>1 1 1
> dog<>is<>1 1 1
> i<>lkie<>1 1 1
> ted@charango:~$ count.pl test1.cnt test.txt --newline
> 
> ted@charango:~$ cat test1.cnt
> 6
> my<>dog<>2 2 2
> is<>nice<>1 1 1
> lkie<>my<>1 1 1
> dog<>is<>1 1 1
> i<>lkie<>1 1 1
> 
> Notice that the bigram "nice i" is excluded when using --newline.
> 
> I hope this helps!
> 
> Good luck,
> Ted
>


Reply via email to