Hi all,

I'd like to extend the functionality of Nutch to incorporate some of the
features of Yacy (http://www.yacy.net/yacy) so that Nutch acts as a caching
HTTP proxy. Basically the user makes a request for a url, if an up-to-date
version is in the cache then this is returned otherwise the url is
"fetched", cached and then returned (I'd also like to receive the parsed
text version). The new url should be added to the index but this should not
block the initial request. The best way I can see for this is to add a
listener which notifies when a url request has been satisfied and added to
the cache (or if the request fails).

Does anyone know of any similar work that they can point me towards or have
any views on the feasibility of this idea. Generally I think the concept of
listeners would be an interesting addition to nutch but any other better or
easier suggestions as to how to implement the above are warmly welcomed.

Neil

Reply via email to