Hi Nutch developers,

Is there any possibility to write some kind of URL Filter that allows just certain URLs to gets fetched? I would like that Nutch is just following some URLs that I allow, whereas seed URLs get further analyzed.

There are already plugins that support URL filtering, which you can specify in a number of different ways. See the following plug-ins:

urlfilter-automaton
urlfilter-domain
urlfilter-prefix
urlfilter-regex
urlfilter-suffix
urlfilter-validator

Which one(s) to use depend on your particular goals.

If none of these would work for you, then you can always create a new plugin that implements the URLFilter interface.

-- Ken
--
Ken Krugler
+1 530-210-6378

Reply via email to