It seems that the only limiting factor is that the regular expressions in the stop-word list are run on the individual words rather than the whole ngram.
Perhaps you could run count.pl without using a stop-word list then use e.g. sed to filter the results. This would allow you to apply regular expressions to the whole ngram. --- In ngram@yahoogroups.com, Ted Pedersen <tpede...@...> wrote: > > This is a great question, and the short answer is NSP does not support > this kind of stopword filtering (although it would clearly be a good > thing to provide). > > As a quick review for others... > > The stopword mechanism in NSP allows you to either filter Ngrams that > are completely made up of stopwords ('and' mode), or to filter Ngrams > that contain one or more stop words ('or' mode) without regard to > position. Which mode you get depends on how you set up your stoplist > file...your stoplist file should start with mode, and then be followed > by regular expressions representing the tokens you'd like to have > considered as stop words... > > @stop.mode=AND > /\bthe\b/ > /\bfor\b/ > > or > > @stop.mode=OR > /\bthe\b/ > /\bfor\b/ > > The OR list would filter out "the united states" while the AND list > would let that be used (since not all words are in the stoplist). If > you don't specify the stop.mode you get AND by default... > > I'll note this as an excellent suggestion, and take a look a twhat > would be involved in supporting it. For now though I can't think of a > good way to do this with NSP. > > Cordially, > Ted > > On Thu, Jan 29, 2009 at 11:11 AM, mercevg <merc...@...> wrote: > > Dear all, > > > > I would like to know if it's possible with NSP not to filter stopwords > > inside of tri-grams. In my results list I just want to filter > > stopwords placed in the first and last position of a tri-gram. > > > > As a exemple, in a sentence like this: > > "Data of variable length (the operand) is preceded by an opcode." > > > > I would like to get as a result list "data of variable" and not "Data > > variable length". > > > > Best whishes, > > Mercè > > > > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse >