So in other words punctuation such as . and , are not used at all by the algorithms/measures and I should get the same results if I remove them before I run count.pl and stat.pl, correct?
--- In ngram@yahoogroups.com, Ying Liu <liux0395@...> wrote: > > Hi Patrick, > > You need to pre-process the text (data cleaning) to remove > punctuations before run by count.pl. The same idea, you > need to post-process to get the format you want of the bigrams > or trigrams. > > Thanks, > Ying > > semiotica24 wrote: > > > > Sorry for the basic questions: > > 1. I need 2 versions of output for each list of bigrams and trigrams > > that I create using the various measures in count.pl and statistic.pl: > > one with the default statistics and one without. How do I format to > > exclude the statistics? > > e.g.: > > mobile<>phones<>100 280 384 > > cellular<>phones<>96 214 384 > > > > mobile phones > > cellular phones > > > > 2. I need to remove punctuation . and , I've tried within my stopword > > list, but I don't have the tags quite right. How should I enter into > > my stop file? > > > > Thanks! > > > > Patrick > > > > >