Hi - you can use different regex files at the indexing stage, see nutch-default for the configuration directive and use -Dparam=val to override the default regex-urlfilter.txt file at indexing stage. Markus
-----Original message----- > From:Albinscode <[email protected]> > Sent: Friday 26th September 2014 11:25 > To: [email protected] > Subject: Url post filtering > > Hello everybody, > > I'm used to filter urls before fetch operation by using regex-filter > to avoid crawling the world wide web. > > I've got a specific need: one main page giving all urls to crawl. I > want to crawl the main page to have outlinks but I dont want to index > this page. How can I proceed? > > I could enable this feature in my specific plugin but I want to be > sure nothing is already existing as ever ;) > Dirty solution would be to delete this main page url in the generated > solr index with a json query but yeah this is really dirty ;) > > Hope I'm clear. >

