Dear Ted,
 
In order to achieve what Christos has asked, Is it necessary to arrange the 
data in such a way that there is only one sentence per line?  If it is a 
running text, how does it identify the end of the sentence?
 
Thanks
Jayaram

--- On Thu, 2/5/09, Ted Pedersen <duluth...@gmail.com> wrote:

From: Ted Pedersen <duluth...@gmail.com>
Subject: Re: [ngram] No ngram over sentence
To: ngram@yahoogroups.com
Date: Thursday, February 5, 2009, 9:41 PM






Hi Christos,

In order to count as you describe, you just need to use the --newLine option.

If you run

count.pl --help

you can see all the command line options. Among them is ...

--newLine Prevents n-grams from spanning across the
new-line character.

which should do exactly as you wish!

Happy Counting, :)
Ted

On Thu, Feb 5, 2009 at 8:29 AM, christos.braeunle
<christos.braeunle@ yahoo.com> wrote:
> Hello
>
> I started using the NSP package and i am realy impressed by its power.
> First of all thanks for that great tool!
>
> Now i run into a problem when building ngrams. I want to tell count.pl
> not to create ngrams over the end of a sentence.
>
> For example: i have two sentences.
>
> Vincent loves Honey Bunny
> A women snorts
>
> Now when building bigrams i would like to get:
>
> Vincent<>loves
> loves<>Honey
> Honey<>Bunny
> A<>women
> women<>snorts
>
> so i want that the bigram Bunny<>A is not created (and don't gets counted)
>
> Is there a way to achieve this?
>
> I hope my question is understandable and has not been ask bevor.
>
> If i missed some relevant documentation, i would be glad to be pointet
> to it.
>
> Thanks a lot
>
> Christos Bräunle
>
> 

-- 
Ted Pedersen
http://www.d. umn.edu/~ tpederse
















      

Reply via email to