Hi Patrick, One other thing to think about is that the stoplist is designed to be used with bigrams - so the stoplist is really intended to remove bigrams after a text has been chopped up into bigrams, and not so much for removing individual words.
More on stoplists here... http://search.cpan.org/~tpederse/Text-NSP-1.23/doc/README.pod#5.6._"Stopping"_the_Ngrams:<http://search.cpan.org/~tpederse/Text-NSP-1.23/doc/README.pod#5.6._> Also, in addition to my --token suggestion, you could consider the use of --nontoken...the --token option excludes anything not defined in your token regex, whereas --nontoken excludes anything that is defined in the regex (so they are two sides of the same coin I suppose...) http://search.cpan.org/~tpederse/Text-NSP-1.23/doc/README.pod#5.4_Removing_character_strings_via_--nontoken_option: Hope this helps... Ted On Wed, Aug 17, 2011 at 1:30 PM, semiotica24 <semiotic...@yahoo.com> wrote: > ** > > > Sorry for the basic questions: > 1. I need 2 versions of output for each list of bigrams and trigrams that I > create using the various measures in count.pl and statistic.pl: one with > the default statistics and one without. How do I format to exclude the > statistics? > e.g.: > mobile<>phones<>100 280 384 > cellular<>phones<>96 214 384 > > mobile phones > cellular phones > > 2. I need to remove punctuation . and , I've tried within my stopword list, > but I don't have the tags quite right. How should I enter into my stop file? > > Thanks! > > Patrick > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse