On 30.03.2004 02:41, Gustavo Nalle Fernandes wrote:
 Thanks for the code! It is indeed very simple! That?s why I like Cocoon :)
  Regarding the Last-Modified header, the getLastModified() do work for GET
request, but the GET request
also brings the whole document and not just the headers. That?s why I was
observing the whole document being
transferred all the time.

Ah, of course. Now it's obvious :) The getLastModified() is only for Cocoon's pipeline caching as it is assumed that the pipeline processing is the most time consuming part. Of course this changes fast if you fetch the content from remote.


So what is the best scenario for the
HTMLGenerator? Always do a HEAD request to see if the remote document is
modified and if it is, make a subsequent GET request OR always make a GET on
every request ? It depends of the size of the document and the modification
frequency. If the remote document is too large, it is inefficent to make a
GET all the time, as the HTMLGenerator does today. On the other hand, if the
document is modified frequently, it would be inefficient to make HEAD and
GET request, since it means making two connections to the remote site.Using
a sitemap parameter specifying the interval that the HTMLGenerator would
fectch data would address both issues. Do you think it is worthy to change
the current HTMLGenerator to include this extra parameter?

Definitely not as this problem is not HTMLGenerator specific, but URLSource specific. So I will raise this question also on the dev list, maybe someone has a clever proposal for this.


For the devs with clever ideas here's the thread (unfortunately RES breaks the thread view at marc.theaimsgroup.com, so switching to gmane.org):
http://thread.gmane.org/gmane.text.xml.cocoon.user/34445


Joerg

Reply via email to