Geoffrey Young wrote:

please take the rest of this as just a friendly discussion - I don't want it
to turn into some kind of bickering match, since that's definitely not what
I have in mind :)

Cool no problem - it's quite a complex thing this, and I was struggling trying to make it clear what exactly needed to be done and where (and why).

ok, that isn't the idea I had about output filters at all.  my own concept
of how this all worked (or should work) is that content handlers are
supposed to just generate content.  specifically, they should not care at
all about RFC compliance - this is why we have a separate header filter,
byterange filter, and so on (and why I think ap_set_last_modified foo should
be in its own filter ;)

In terms of very simple content handlers, such as a handler that might serve content stored in a file on disk, the above is true - it doesn't care much about HTTP, that is mostly handled by higher layers.

The problem starts creeping in when the content handler is less trivial than the file serving handler, such as mod_proxy, which receives an HTTP request from the input filter stack, and returns an HTTP response to the output filter stack based on content and headers generated by a backend server.

In this case, we're not just feeding content up the stack, but content _and_ HTTP headers. Filters cannot ignore the headers, otherwise broken behaviour is the result. A classic example is a filter that changes the length of the content (mod_gzip, or mod_include). These filters need to concern themselves with the HTTP Content-Length header, otherwise a response from mod_proxy going up the stack could get shipped to the browser with the wrong Content-Length.

In most cases for filters handling the headers is trivial. mod_gzip might strip off a Content-Length header in the hope that a filter might chunk the response down the line. mod_include should (in the most simple case) strip off any Range headers in the request in the hope that the byte range filter handles the range request down the line.

But in the case of mod_proxy, mod_jk, etc it is quite valid and very desirable for a range request to be passed all the way to the backend, in the hope that the backend sends just that range back to mod_proxy, which in turn sends it up a filter stack that isn't going to fall over because it received a 206 Partial Content response.

that's true if I'm wrong about the assumption above.  but in my mind, the
filter API is the most useful if content handlers (and content-altering
filters) can remain ignorant of 206 responses and the byterange filter can
bat cleanup.

For simplicity case the above is a noble goal - but one with some significant performance drawbacks in many real world applications.

Apart from the mod_proxy case, think of a webserver (or bank of webservers) serving content hosted on an NFS server. The entire 650MB ISO file (for example) needs to be transferred from the NFS server to the webserver for every hit to that file - even when a user is continuing a download (which in the case of a file the size of an ISO will likely be often).

sure :)  I guess where we have different ideas, then, is in who exactly
should be responsible for RFC compliance.  I had always assumed that there
was (or should be) very little that a content handler needed to worry about
in this respect, and that it was the job of the core server engine (via
various early or late-running filters) to take care of things like HEAD
requests, HTTP/0.9 requests/responses, chunked encoding, range requests, etc.

The above is still true - there is (and should be) very little for the content handler to worry about when it comes to HTTP compliance, and content handlers should have the option to just generate content, as they do now.

The problem though is not with the content handlers but with the filters - filters must not make the assumption that all content handlers only serve content and not HTTP headers. When a content handler decides that it wants to handle more of the HTTP spec so as to improve performance, it should be free to do so, and should not be stopped from doing so due to limitations in the output filters.

In other words if mod_proxy is taught how to pass Range requests to the backend server, the output filter stack should not stop proxy from doing so by removing Range headers unless it is absolutely necessary.

Regards,
Graham
--

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature



Reply via email to