I would run count.pl without a stop list (or perhaps just a normal
stop list) and then process the output in another program (e.g. sed).
 This one liner would do the trick:

sed -ne '/XTOPOX/p' count-output.cnt
 


--- In ngram@yahoogroups.com, "ftwaroch" <f.a.twar...@...> wrote:
>
> Dear ngram group,
> 
> Thanks for a great tool! I started playing with count.pl a couple of
> days ago and wondered if it was possible to do the opposite of a
> stopword list. My intention is to create an n-gram file that contains
> only n-grams with a certain item (I investigate placenames in a text.)
>  I replaced all known placenames with a dummy value XTOPOX, and
> defined a stoplist file - 
> 
> @stop.mode=AND
> /[^XTOP]/
> 
> This is not very clean approach as all patterns that are not XTOP are
> returned and I get noise back as well, see example:
> 
> example_out.txt
> 
> 16
> to<>XTOPOX<>2 2 4 
> XTOPOX<>.<>2 4 3 
> Tudur<>XTOPOX<>1 1 4 
> XTOPOX<>on<>1 4 1 
> XTOPOX<>,<>1 4 1 
> OF<>TO<>1 1 1 
> XX<>THE<>1 1 1 
> CHAPTER<>XX<>1 2 1 
> ,<>XTOPOX<>1 2 4 
> TO<>DAY<>1 1 1 
> X<>LLYWELYN<>1 1 1 
> T<>.<>1 1 3 
> ,<>T<>1 2 1 
> CHAPTER<>X<>1 2 1 
> 
> 
> Is there approach to that? If you have any pointers for me I would be
> very happy.
> many thanks,
> 
> Florian
>


Reply via email to