Hi Jayaram, Yes, in order to restrict ngrams to individual sentences you would need to use the -newLine option, and make sure that you had one sentence per line, one line per sentence. Identifying sentences boundaries is a non-trivial problem, but we have some simple code available as a part of our WordNet::SenseRelate::AllWords package that could be a useful starting point for a sentence boundary detector.
http://cpansearch.perl.org/src/TPEDERSE/WordNet-SenseRelate-AllWords-0.13/utils/sentence_split.pl This is not intended to "solve" the problem, but it will do a reasonable approximation of sentence boundary detection. I hope this helps! Cordially, Ted On Fri, Feb 6, 2009 at 1:01 AM, jayaram raji <jayaram_raji2...@yahoo.com> wrote: > Dear Ted, > > In order to achieve what Christos has asked, Is it necessary to arrange the > data in such a way that there is only one sentence per line? If it is a > running text, how does it identify the end of the sentence? > > Thanks > Jayaram > > --- On Thu, 2/5/09, Ted Pedersen <duluth...@gmail.com> wrote: > > From: Ted Pedersen <duluth...@gmail.com> > Subject: Re: [ngram] No ngram over sentence > To: ngram@yahoogroups.com > Date: Thursday, February 5, 2009, 9:41 PM > > Hi Christos, > > In order to count as you describe, you just need to use the --newLine > option. > > If you run > > count.pl --help > > you can see all the command line options. Among them is ... > > --newLine Prevents n-grams from spanning across the > new-line character. > > which should do exactly as you wish! > > Happy Counting, :) > Ted > > On Thu, Feb 5, 2009 at 8:29 AM, christos.braeunle > <christos.braeunle@ yahoo.com> wrote: >> Hello >> >> I started using the NSP package and i am realy impressed by its power. >> First of all thanks for that great tool! >> >> Now i run into a problem when building ngrams. I want to tell count.pl >> not to create ngrams over the end of a sentence. >> >> For example: i have two sentences. >> >> Vincent loves Honey Bunny >> A women snorts >> >> Now when building bigrams i would like to get: >> >> Vincent<>loves >> loves<>Honey >> Honey<>Bunny >> A<>women >> women<>snorts >> >> so i want that the bigram Bunny<>A is not created (and don't gets counted) >> >> Is there a way to achieve this? >> >> I hope my question is understandable and has not been ask bevor. >> >> If i missed some relevant documentation, i would be glad to be pointet >> to it. >> >> Thanks a lot >> >> Christos Bräunle >> >> > > -- > Ted Pedersen > http://www.d. umn.edu/~ tpederse > > -- Ted Pedersen http://www.d.umn.edu/~tpederse