Did you make sure to include the filters in your plugin settings (conf/nutch-site.xml)
I must admit, i haven't paid attention to check to see if the plugin is used during the fetch or only when you update the DB (or both?). -----Original Message----- From: EM <[EMAIL PROTECTED]> To: nutch-user@incubator.apache.org Date: Thu, 14 Apr 2005 12:35:09 -0400 Subject: How can I limit my fetching process? > Hi, > > I cannot make nutch obey my preferences what to fetch from internet. > > In the regex-urlfilter.txt and crawl-urlfilter.txt I have a line > stating: > > +^http://([a-z0-9]*\.)*.mk/ > > With which I hope to return all (and only) pages from the .mk domain. > > However, when I try to run my fetch.sh script: > -------- > bin/nutch generate db segments > s3=`ls -d segments/2* | tail -1` > echo $s3 > > bin/nutch fetch $s3 > bin/nutch updatedb db $s3 > bin/nutch analyze db 2 > bin/nutch index $s3 > bin/nutch dedup segments dedup.tmp > -------- > > I can see the fetcher returning .com domains also. > > How can I limit my fetching process? Am I missing the obvious (did > something wrong with the fetch script)? > > Emilijan >