In this case, the main problem is not to get an option to decide in
which position the user wants to filter stopwords. 

To resolve this problem, and get my trigrams Spanish list correctly
filtered, I've run count.pl without using a stopword list and then
I've filtered trigrams with a Perl program.  

As also Ted said, it could be a very useful option to include in NSP.

Mercè   


--- In ngram@yahoogroups.com, "robsteranium" <robsteran...@...> wrote:
>
> It seems that the only limiting factor is that the regular expressions
> in the stop-word list are run on the individual words rather than the
> whole ngram.
> 
> Perhaps you could run count.pl without using a stop-word list then use
> e.g. sed to filter the results.  This would allow you to apply regular
> expressions to the whole ngram.
> 
> 
> 
> --- In ngram@yahoogroups.com, Ted Pedersen <tpederse@> wrote:
> >
> > This is a great question, and the short answer is NSP does not support
> > this kind of stopword filtering (although it would clearly be a good
> > thing to provide).
> > 
> > As a quick review for others...
> > 
> > The stopword mechanism in NSP allows you to either filter Ngrams that
> > are completely made up of stopwords ('and' mode), or to filter Ngrams
> > that contain one or more stop words ('or' mode) without regard to
> > position. Which mode you get depends on how you set up your stoplist
> > file...your stoplist file should start with mode, and then be followed
> > by regular expressions representing the tokens you'd like to have
> > considered as stop words...
> > 
> > @stop.mode=AND
> > /\bthe\b/
> > /\bfor\b/
> > 
> > or
> > 
> > @stop.mode=OR
> > /\bthe\b/
> > /\bfor\b/
> > 
> > The OR list would filter out "the united states" while the AND list
> > would let that be used (since not all words are in the stoplist). If
> > you don't specify the stop.mode you get AND by default...
> > 
> > I'll note this as an excellent suggestion, and take a look a twhat
> > would be involved in supporting it. For now though I can't think of a
> > good way to do this with NSP.
> > 
> > Cordially,
> > Ted
> > 
> > On Thu, Jan 29, 2009 at 11:11 AM, mercevg <mercevg@> wrote:
> > > Dear all,
> > >
> > > I would like to know if it's possible with NSP not to filter
stopwords
> > > inside of tri-grams. In my results list I just want to filter
> > > stopwords placed in the first and last position of a tri-gram.
> > >
> > > As a exemple, in a sentence like this:
> > > "Data of variable length (the operand) is preceded by an opcode."
> > >
> > > I would like to get as a result list "data of variable" and not
"Data
> > > variable length".
> > >
> > > Best whishes,
> > > Mercè
> > >
> > > 
> > 
> > 
> > 
> > -- 
> > Ted Pedersen
> > http://www.d.umn.edu/~tpederse
> >
>


Reply via email to