This is a great question, and the short answer is NSP does not support
this kind of stopword filtering (although it would clearly be a good
thing to provide).

As a quick review for others...

The stopword mechanism in NSP allows you to either filter Ngrams that
are completely made up of stopwords ('and' mode), or to filter Ngrams
that contain one or more stop words ('or' mode) without regard to
position. Which mode you get depends on how you set up your stoplist
file...your stoplist file should start with mode, and then be followed
by regular expressions representing the tokens you'd like to have
considered as stop words...

@stop.mode=AND
/\bthe\b/
/\bfor\b/

or

@stop.mode=OR
/\bthe\b/
/\bfor\b/

The OR list would filter out "the united states" while the AND list
would let that be used (since not all words are in the stoplist). If
you don't specify the stop.mode you get AND by default...

I'll note this as an excellent suggestion, and take a look a twhat
would be involved in supporting it. For now though I can't think of a
good way to do this with NSP.

Cordially,
Ted

On Thu, Jan 29, 2009 at 11:11 AM, mercevg <merc...@yahoo.es> wrote:
> Dear all,
>
> I would like to know if it's possible with NSP not to filter stopwords
> inside of tri-grams. In my results list I just want to filter
> stopwords placed in the first and last position of a tri-gram.
>
> As a exemple, in a sentence like this:
> "Data of variable length (the operand) is preceded by an opcode."
>
> I would like to get as a result list "data of variable" and not "Data
> variable length".
>
> Best whishes,
> Mercè
>
> 



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Reply via email to