The way I go is that I index such pages anyway but 'tag' them. So I
use a index filter for that and tag the positive pages with a other tag.
Like this category:trash or category:nugget.
Than I also use a querfilter plugin and in the ui I extend my query:
queryString+ " category:nugget"
So you will have only non trash pages in your results. I guess you
can also use the prune tool to remove such trash pages the index if
you like.
HTH
Stefan
Am 14.02.2006 um 08:11 schrieb Elwin:
When using nutch to crawl some sites, I want to index fetched contents
selectively only when the urls to these contents fit my filter, for
other
urls I just want nutch to crawl them and parse them without index.
How can I achieve this? Which extension point should I extend?