Dear ngram group,

Thanks for a great tool! I started playing with count.pl a couple of
days ago and wondered if it was possible to do the opposite of a
stopword list. My intention is to create an n-gram file that contains
only n-grams with a certain item (I investigate placenames in a text.)
 I replaced all known placenames with a dummy value XTOPOX, and
defined a stoplist file - 

@stop.mode=AND
/[^XTOP]/

This is not very clean approach as all patterns that are not XTOP are
returned and I get noise back as well, see example:

example_out.txt

16
to<>XTOPOX<>2 2 4 
XTOPOX<>.<>2 4 3 
Tudur<>XTOPOX<>1 1 4 
XTOPOX<>on<>1 4 1 
XTOPOX<>,<>1 4 1 
OF<>TO<>1 1 1 
XX<>THE<>1 1 1 
CHAPTER<>XX<>1 2 1 
,<>XTOPOX<>1 2 4 
TO<>DAY<>1 1 1 
X<>LLYWELYN<>1 1 1 
T<>.<>1 1 3 
,<>T<>1 2 1 
CHAPTER<>X<>1 2 1 


Is there approach to that? If you have any pointers for me I would be
very happy.
many thanks,

Florian


Reply via email to