On Mon, Aug 1, 2016 at 2:13 PM, Jacob Champion <champio...@gmail.com> wrote:
> All right, getting back to this after a week off. I've tried to combine > feedback as best I can into one message. > > Bill, you wrote: > > I'm perfectly happy to translate such values to GMT for non-HTTP >> inputs, per spec. If we are going to do so for HTTP inputs, loudly >> scolding the errant developer in the logs seems prudent, for their own >> longer-term benefit. >> > > I suspect this is the core of the argument now. In my opinion, CGI is not > an HTTP input -- either as far as the spec is concerned (for spec lawyering > purposes) or in practice. It is a separate (HTTP-like) protocol, and an > implementation detail of the server. In other words, we are not "proxying" > to a CGI app; the CGI app is contained by the server and is providing > inputs to the server response. > > That is a separate argument from "is it wise to correct GMT timestamps", > though. > > 1. Why do you/reporter want HTTP applications to persist in writing code >> which breaks between different transport providers/cgi hosting >> environments? >> The language has been crystal clear for 2 decades. We do a huge disservice >> to the PHP author community to let them be idiots. Alternately, the PHP >> SAPI itself could rectify this. (We aren't talking about non-HTTP >> sources.) >> > > I'm not sure where PHP enters the conversation. They are only one (large > and important!) CGI producer; we're talking about our behavior with *all* > CGI applications here. > > I do like your argument that we should do as little transformation as > possible, in order to facilitate moving CGI apps between environments. > Implementation differences are nasty for everyone. But I'm not convinced > that ship hasn't sailed; currently, it looks like we modify outgoing CGI > responses in order to merge headers, normalize Content-Type, and produce > Unmodified and Precondition Failed responses. > Merging headers is a well-defined behavior in the HTTP spec (see my earlier comments on RFC 3875). Three CGI header fields are CGI-specific. I don't know that you can correlate these mandated adjustments to date formats. There may be others I have missed, but this doesn't look like the behavior > of a server that considers itself a transparent "passthrough" to a CGI > application. (Isn't that what CGI-NPH is for?) But! I could definitely be > swayed otherwise, if that's what we'd like to do moving forwards. I think > both sides have potential value, but we should choose one. Let's not even get into NPH scripts... heh. > If there is date input that we cannot handle, the >> spec strongly encourages us to interpret it as now(), provided we have a >> clock (which all of our architectures do.) >> > > In the absence of a quote from the spec, I'm still in strong disagreement > with this, based on the language I quoted last week. > The spec demands future -> now() if we know now(). On other erroneous data, that's up for debate. Moving on to Stefan's comments: > > If we see CGI as a kind of input that is not strictly regulated by >> HTTP header formats (and that is an if), we should correct timezone >> offset to GMT, but otherwise leave the time unchanged. It might be our >> clock that has the issue. Meddling with it will not help anyone >> debugging problems. >> > > +1 (and I am currently of the opinion that CGI is not a strict HTTP input, > as stated above). > Jacob and I have entirely different opinions on that, Stefan's comment is very interesting. We have a clock on nearly every deployment of httpd, but that doesn't mean it is correct. However the spec absolutely demands that we do not send our untrusted "Date:" with a future "Last-Modified:" value, it is externally inconsistent. > If the value is unparseable, we should log it and suppress sending >> outa "Last-Modified" completely. Also any "If-*" checking should >> behave as if the header was not present. >> > > +1. Sounds right. > The alternative is to expect the CGI to honor HTTP/1.1 header >> semantics, pass values unchanged and let CGI and client run into >> misunderstandings immediately. >> > > Practically, I'm not super opposed to this alternative (but if we choose > it, we should apply it consistently). If I put on spec-lawyer hat, the CGI > RFC has this to say: > > [https://tools.ietf.org/html/rfc3875#section-6.2.1] > > The server MUST make any appropriate modifications to the script's >> output to ensure that the response to the client complies with the >> response protocol version. >> > > So this alternative is not my first choice. Invalid headers should really > either be corrected (if the correction is obvious, safe, and helpful), or > dropped entirely. Or the entire response should be 500'd, but we run into > major compatibility breaks if we choose that route. > No, I agree that a 500 is not an option. Drop it, or fix it loudly in the logs, but we can't break existing deployments (which don't accept non-GMT dates today, FWIW). > And finally, from the latest patch from Luca: > > 2) Some comments have been added in the code to state clearly that >> anynon compliant datetime strings will not be interpreted or re-formatted. >> > > As stated above, this is not my first choice -- but I wouldn't oppose it > if that's what the consensus comes to. > Right, the comments call out what the HTTP spec demanded. > else if (!ap_cstr_casecmp(w, "Last-Modified")) { >> - apr_time_t parsed_date = apr_date_parse_rfc(l); >> + apr_time_t parsed_date = apr_date_parse_http(l); >> > > apr_date_parse_http() is not good enough; IIUC, it completely ignores > timezones, which further corrupts non-GMT Last-Modified stamps. We either > want strict parsing or actual correction, not something in the middle. I'm happy to react to bad input following a parsable date string, e.g. any non-"GMT" signifier, as entirely bad input worth emitting to the error log. Let's help CGI authors find their bugs instead of generating unexpected results, such as 5-8 hour wrong "Last Modified" dates in the US and now() throughout Europe, owing to the time zone deltas.