So in other words punctuation such as . and , are not used at all by the 
algorithms/measures and I should get the same results if I remove them before I 
run count.pl and stat.pl, correct?

--- In ngram@yahoogroups.com, Ying Liu <liux0395@...> wrote:
>
> Hi Patrick,
> 
> You need to pre-process the text (data cleaning) to remove
> punctuations before run by count.pl. The same idea, you
> need to post-process to get the format you want of the bigrams
> or trigrams.
> 
> Thanks,
> Ying
> 
> semiotica24 wrote:
> >
> > Sorry for the basic questions:
> > 1. I need 2 versions of output for each list of bigrams and trigrams 
> > that I create using the various measures in count.pl and statistic.pl: 
> > one with the default statistics and one without. How do I format to 
> > exclude the statistics?
> > e.g.:
> > mobile<>phones<>100 280 384
> > cellular<>phones<>96 214 384
> >
> > mobile phones
> > cellular phones
> >
> > 2. I need to remove punctuation . and , I've tried within my stopword 
> > list, but I don't have the tags quite right. How should I enter into 
> > my stop file?
> >
> > Thanks!
> >
> > Patrick
> >
> >
>


Reply via email to