All right, getting back to this after a week off. I've tried to combine
feedback as best I can into one message.
Bill, you wrote:
I'm perfectly happy to translate such values to GMT for non-HTTP
inputs, per spec. If we are going to do so for HTTP inputs, loudly
scolding the errant developer in the logs seems prudent, for their own
longer-term benefit.
I suspect this is the core of the argument now. In my opinion, CGI is
not an HTTP input -- either as far as the spec is concerned (for spec
lawyering purposes) or in practice. It is a separate (HTTP-like)
protocol, and an implementation detail of the server. In other words, we
are not "proxying" to a CGI app; the CGI app is contained by the server
and is providing inputs to the server response.
That is a separate argument from "is it wise to correct GMT timestamps",
though.
1. Why do you/reporter want HTTP applications to persist in writing code
which breaks between different transport providers/cgi hosting environments?
The language has been crystal clear for 2 decades. We do a huge disservice
to the PHP author community to let them be idiots. Alternately, the PHP
SAPI itself could rectify this. (We aren't talking about non-HTTP sources.)
I'm not sure where PHP enters the conversation. They are only one (large
and important!) CGI producer; we're talking about our behavior with
*all* CGI applications here.
I do like your argument that we should do as little transformation as
possible, in order to facilitate moving CGI apps between environments.
Implementation differences are nasty for everyone. But I'm not convinced
that ship hasn't sailed; currently, it looks like we modify outgoing CGI
responses in order to merge headers, normalize Content-Type, and produce
Unmodified and Precondition Failed responses.
There may be others I have missed, but this doesn't look like the
behavior of a server that considers itself a transparent "passthrough"
to a CGI application. (Isn't that what CGI-NPH is for?) But! I could
definitely be swayed otherwise, if that's what we'd like to do moving
forwards. I think both sides have potential value, but we should choose one.
If there is date input that we cannot handle, the
spec strongly encourages us to interpret it as now(), provided we have a
clock (which all of our architectures do.)
In the absence of a quote from the spec, I'm still in strong
disagreement with this, based on the language I quoted last week.
Moving on to Stefan's comments:
If we see CGI as a kind of input that is not strictly regulated by
HTTP header formats (and that is an if), we should correct timezone
offset to GMT, but otherwise leave the time unchanged. It might be our
clock that has the issue. Meddling with it will not help anyone
debugging problems.
+1 (and I am currently of the opinion that CGI is not a strict HTTP
input, as stated above).
If the value is unparseable, we should log it and suppress sending
outa "Last-Modified" completely. Also any "If-*" checking should
behave as if the header was not present.
+1.
The alternative is to expect the CGI to honor HTTP/1.1 header
semantics, pass values unchanged and let CGI and client run into
misunderstandings immediately.
Practically, I'm not super opposed to this alternative (but if we choose
it, we should apply it consistently). If I put on spec-lawyer hat, the
CGI RFC has this to say:
[https://tools.ietf.org/html/rfc3875#section-6.2.1]
The server MUST make any appropriate modifications to the script's
output to ensure that the response to the client complies with the
response protocol version.
So this alternative is not my first choice. Invalid headers should
really either be corrected (if the correction is obvious, safe, and
helpful), or dropped entirely. Or the entire response should be 500'd,
but we run into major compatibility breaks if we choose that route.
And finally, from the latest patch from Luca:
2) Some comments have been added in the code to state clearly that
anynon compliant datetime strings will not be interpreted or re-formatted.
As stated above, this is not my first choice -- but I wouldn't oppose it
if that's what the consensus comes to.
else if (!ap_cstr_casecmp(w, "Last-Modified")) {
- apr_time_t parsed_date = apr_date_parse_rfc(l);
+ apr_time_t parsed_date = apr_date_parse_http(l);
apr_date_parse_http() is not good enough; IIUC, it completely ignores
timezones, which further corrupts non-GMT Last-Modified stamps. We
either want strict parsing or actual correction, not something in the
middle.
--Jacob