Hi all, I'd like to extend the functionality of Nutch to incorporate some of the features of Yacy (http://www.yacy.net/yacy) so that Nutch acts as a caching HTTP proxy. Basically the user makes a request for a url, if an up-to-date version is in the cache then this is returned otherwise the url is "fetched", cached and then returned (I'd also like to receive the parsed text version). The new url should be added to the index but this should not block the initial request. The best way I can see for this is to add a listener which notifies when a url request has been satisfied and added to the cache (or if the request fails).
Does anyone know of any similar work that they can point me towards or have any views on the feasibility of this idea. Generally I think the concept of listeners would be an interesting addition to nutch but any other better or easier suggestions as to how to implement the above are warmly welcomed. Neil
