Ted, Thanks for your explanations.
I think so, it would be very useful to implement this kind of filter, especially working with Romance languagues. In my case, I have to compare bi-grams and tri-grams results in English, Spanish and Catalan, for that reason it would be very useful to get this implementation. Congratulations for the new version of NSP. Best wishes, Mercè --- In ngram@yahoogroups.com, Ted Pedersen <tpede...@...> wrote: > > This is a great question, and the short answer is NSP does not support > this kind of stopword filtering (although it would clearly be a good > thing to provide). > > As a quick review for others... > > The stopword mechanism in NSP allows you to either filter Ngrams that > are completely made up of stopwords ('and' mode), or to filter Ngrams > that contain one or more stop words ('or' mode) without regard to > position. Which mode you get depends on how you set up your stoplist > file...your stoplist file should start with mode, and then be followed > by regular expressions representing the tokens you'd like to have > considered as stop words... > > @stop.mode=AND > /\bthe\b/ > /\bfor\b/ > > or > > @stop.mode=OR > /\bthe\b/ > /\bfor\b/ > > The OR list would filter out "the united states" while the AND list > would let that be used (since not all words are in the stoplist). If > you don't specify the stop.mode you get AND by default... > > I'll note this as an excellent suggestion, and take a look a twhat > would be involved in supporting it. For now though I can't think of a > good way to do this with NSP. > > Cordially, > Ted > > On Thu, Jan 29, 2009 at 11:11 AM, mercevg <merc...@...> wrote: > > Dear all, > > > > I would like to know if it's possible with NSP not to filter stopwords > > inside of tri-grams. In my results list I just want to filter > > stopwords placed in the first and last position of a tri-gram. > > > > As a exemple, in a sentence like this: > > "Data of variable length (the operand) is preceded by an opcode." > > > > I would like to get as a result list "data of variable" and not "Data > > variable length". > > > > Best whishes, > > Mercè > > > > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse >