Hello, Thank you for your detailed review!
2024年6月24日(月) 9:02 Maxim Dounin <mdou...@mdounin.ru>: > > Hello! > > On Thu, Jun 20, 2024 at 08:39:43PM +0900, Hiroaki Nakamura wrote: > Note that it might be a good idea to update the commit log to > follow style rules, such as: > > : Cache: added calculation of the Age header. > : > : Implement the calculation of the Age header as specified in > : RFC 9111 "HTTP Caching", https://www.rfc-editor.org/rfc/rfc9111.html I have updated the commit message. > > @@ -75,6 +77,8 @@ > > time_t error_sec; > > time_t last_modified; > > time_t date; > > + time_t response_time; > > Note that the "response_time" field being added seems to be > exactly equivalent to the exiting "date" field. Yes, the "response_time" is equivalent to the "date" field when receiving a response from the upstream. However the 'date" is updated when sending a cached response. We need to keep the cache creation time. I have renamed "response_time" to "creation_time" to make it clearer. > [diff] > showfunc=1 I have added this config to my ~/.hgrc. > Note that this uses data from the cache node, which is not > available if the cache node was loaded from disk, for example, on > server restart, and this will certainly lead to incorrect results. > If I'm reading the code correctly, it will result in seconds since > the Epoch in the Age header on all responses after a restart. > > This suggests that either things should be implemented quite > differently, or this patch needs to be merged with the second one, > which implements loading relevant information from the cache > header. I have merged this patch with the second one. > > + ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, > > + "http file cache send, resp:%O, resident:%d, age:%d", > > + c->response_time, resident_time, current_age); > Please also note that using "%d" for time_t is wrong, as time_t > size might be different from the size of int. For time_t, "%T" > should be used instead. I have changed to use "%T". > > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_request.h > > --- a/src/http/ngx_http_request.h Thu Jun 20 20:26:24 2024 +0900 > > +++ b/src/http/ngx_http_request.h Thu Jun 20 20:26:41 2024 +0900 > > @@ -290,6 +290,7 @@ > > off_t content_offset; > > time_t date_time; > > time_t last_modified_time; > > + off_t age_n; > > Using off_t here looks wrong, as off_t is a type for file offsets, > and not for dates. I have changed the type of age_n to time_t. > > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.c > > --- a/src/http/ngx_http_upstream.c Thu Jun 20 20:26:24 2024 +0900 > > +++ b/src/http/ngx_http_upstream.c Thu Jun 20 20:26:41 2024 +0900 > > @@ -1549,6 +1559,7 @@ > > ngx_memzero(u->state, sizeof(ngx_http_upstream_state_t)); > > > > u->start_time = ngx_current_msec; > > + u->request_time = ngx_time(); > > Adding a duplicate start time shouldn't be needed. It is not a duplicate since u->request_time is real clock time as opposed to u->start_time is monotonic clock. We need a real clock time of response_time since we calculate the difference from the upstream date value. Also we use the calculation response_time - request_time when the upstream date is not present or not well synchronized (see below), so we also need request_time to be a real clock time. > > @@ -2529,6 +2541,8 @@ > > return; > > } > > > > + ngx_http_upstream_update_age(r, u, ngx_time()); > > Note that the "ngx_time()" as an argument looks unneeded, it > should be better to obtain the time in the function itself. I have changed not to pass ngx_time() as an argument. > Also, from semantic point of view this probably should be in > ngx_http_upstream_process_headers(), and not a dedicated function > call. I have moved the code for * resident_time = now - response_time; * current_age = corrected_initial_age + resident_time; from ngx_http_cache_send to ngx_http_upstream_process_headers. As for ngx_http_upstream_update_age, I tried to move it into ngx_http_upstream_process_headers(), but I could not make tests pass. So I keep ngx_http_upstream_update_age to be called from ngx_http_upstream_process_header and ngx_http_upstream_test_next. > > @@ -2648,7 +2663,12 @@ > > valid = ngx_http_file_cache_valid(u->conf->cache_valid, > > u->headers_in.status_n); > > if (valid) { > > - valid = now + valid; > > + ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, > > + "adjust cache valid_sec:%O, " > > + "valid:%O, init_age:%d for 304", > > + now + valid - > > r->cache->corrected_initial_age, > > + valid, r->cache->corrected_initial_age); > > + valid = now + valid - r->cache->corrected_initial_age; > > Not sure if Age calculations should apply to caching time as > calculated with proxy_cache_valid directives, but if it does, it > should apply consistently, in all cases where proxy_cache_valid > times are used. At least ngx_http_upstream_intercept_errors() > case seems to be ignored in the patch. I have updated ngx_http_upstream_intercept_errors as well. > Also, if Age applies, there should be a way to ignore it, > similarly to how Cache-Control can be ignored with the > proxy_ignore_headers directive. I do not know how to do this. Maybe add { ngx_string("Age"), NGX_HTTP_UPSTREAM_IGN_AGE }, to ngx_http_upstream_ignore_headers_masks? > > @@ -2672,6 +2692,59 @@ > > } > > > > > > +static void > > +ngx_http_upstream_update_age(ngx_http_request_t *r, ngx_http_upstream_t *u, > > + time_t now) > > +{ > > + time_t response_time, date, apparent_age, response_delay, age_value, > > + corrected_age_value, corrected_initial_age; > > + > > + /* > > + * Update age response header. > > + * https://www.rfc-editor.org/rfc/rfc9111.html#name-calculating-age > > + */ > > + response_time = now; > > + if (u->headers_in.date != NULL) { > > + date = ngx_parse_http_time(u->headers_in.date->value.data, > > + u->headers_in.date->value.len); > > + if (date == NGX_ERROR) { > > + date = now; > > + } > > + } else { > > + date = now; > > + } > > + apparent_age = ngx_max(0, response_time - date); > > + > > + response_delay = response_time - u->request_time; > > + age_value = u->headers_in.age_n != -1 ? u->headers_in.age_n : 0; > > + corrected_age_value = age_value + response_delay; > > + > > + corrected_initial_age = ngx_max(apparent_age, corrected_age_value); > > + r->headers_out.age_n = corrected_initial_age; > > Note that this approach, as described in RFC 9111 as a possible > way to calculate the Age header, implies that time on both proxy > server and the origin server must be "reasonably well > synchronized", which is not really the case in many practical > situations. > > If the time on the origin server is mostly arbitrary, for example, > in the past, apparent_age might be very large, leading to a very > large corrected_initial_age as well, preventing caching. > > As such, I would rather consider a more robust algorithm instead. Yes, if the upstream date is too old, then apparant_age becomes too big. How about using the following calculation? if upstream date is between request_time and response_time, we assume the upstream's clock is reasonably well synchronized and use corrected_initial_age = age_value + (response_time - date); if upstream date is not present or not between request_time and response_time, we do not rely on the upstream date and use corrected_initial_age = age_value + (response_time - u->request_time); When sending a cached response, we calculate the Age as specified in RFC 9111: resident_time = now - response_time; current_age = corrected_initial_age + resident_time; where response_time is cache_creation_time. I have updated my patchset to use the above calculation. > > +#if (NGX_HTTP_CACHE) > > + if (r->cache) { > > + r->cache->response_time = response_time; > > + r->cache->corrected_initial_age = corrected_initial_age; > > + if (u->headers_in.adjusting_valid_sec) { > > + r->cache->valid_sec -= corrected_initial_age; > > The "adjusting_valid_sec" logic looks unclear (probably at least > needs a better name) and fragile. If I'm reading the code > correctly, it will do the wrong thing at least if the upstream > server returns something like: > > Cache-Control: max-age=60 > X-Accel-Expires: @<time> > > since u->headers_in.adjusting_valid_sec won't be cleared. I found that we must ignore Expires when Cache-Control with max-age or s-maxage is set according to https://www.rfc-editor.org/rfc/rfc9111#name-expires: "If a response includes a Cache-Control header field with the max-age directive (Section 5.2.2.1), a recipient MUST ignore the Expires header field. Likewise, if a response includes the s-maxage directive (Section 5.2.2.10), a shared cache recipient MUST ignore the Expires header field." I have renamed adjusting_valid_sec to relative_freshness and added guard for Expires and X-Accel-Expires: @<time> like below: /* * https://www.rfc-editor.org/rfc/rfc9111#name-expires * * If a response includes a Cache-Control header field with the max-age * directive (Section 5.2.2.1), a recipient MUST ignore the Expires * header field. Likewise, if a response includes the s-maxage directive * (Section 5.2.2.10), a shared cache recipient MUST ignore the Expires * header field. */ if (!u->headers_in.relative_freshness) { r->cache->valid_sec = expires; } > > + ngx_log_debug2(NGX_LOG_DEBUG_HTTP, u->peer.connection->log, 0, > > + "http upstream adjusted cache " > > + "valid_sec:%O, init_age:%O", > > + r->cache->valid_sec, corrected_initial_age); > > + } > > + } > > +#endif > > If I'm reading the code correctly, this will add the Age > header to responses which are not cached. This somewhat > contradicts to what RFC 9111 says in the description of the Age > header (https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1-6): > > : The presence of an Age header field implies that the response was > : not generated or validated by the origin server for this request. > > Please also note that in many cases [free]nginx is expected to be > seen as the origin server, even if it caches some upstream server > responses according to the configuration (in many cases, ignoring > Cache-Control or Expires headers). In this case using the > Age header even on cached responses might not be desired. It > would be great to dig further into possible use cases and make > sure all are properly covered. I have changed to not add Age if the calculated value is zero and also updated tests. > > @@ -5319,6 +5397,39 @@ > > > > > > static ngx_int_t > > +ngx_http_upstream_process_age(ngx_http_request_t *r, > > + ngx_table_elt_t *h, ngx_uint_t offset) > > +{ > > + ngx_http_upstream_t *u; > > + > > + u = r->upstream; > > + > > + if (u->headers_in.age) { > > + ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, > > + "upstream sent duplicate header line: "%V: %V", " > > + "previous value: "%V: %V"", > > + &h->key, &h->value, > > + &u->headers_in.age->key, > > + &u->headers_in.age->value); > > + return NGX_HTTP_UPSTREAM_INVALID_HEADER; > > + } > > + > > + h->next = NULL; > > + u->headers_in.age = h; > > + u->headers_in.age_n = ngx_atoof(h->value.data, h->value.len); > > + > > + if (u->headers_in.age_n == NGX_ERROR) { > > + ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, > > + "upstream sent invalid "Age" header: " > > + ""%V: %V"", &h->key, &h->value); > > + return NGX_HTTP_UPSTREAM_INVALID_HEADER; > > + } > > Note that returning errors here violates RFC 9111 SHOULD > recommendations, notably > (https://www.rfc-editor.org/rfc/rfc9111.html#section-5.1): > > : Although it is defined as a singleton header field, a cache > : encountering a message with a list-based Age field value SHOULD > : use the first member of the field value, discarding subsequent > : ones. > > : If the field value (after discarding additional members, as per > : above) is invalid (e.g., it contains something other than a > : non-negative integer), a cache SHOULD ignore the field. > > Unless there are good reasons, a better solution might be to > report warnings and follow the recommendations instead. I have changed the invalid Age value and duplicated Age headers to report warnings. > > diff -r 73929a4f3447 -r c81df54e3d03 src/http/ngx_http_upstream.h > > --- a/src/http/ngx_http_upstream.h Thu Jun 20 20:26:24 2024 +0900 > > +++ b/src/http/ngx_http_upstream.h Thu Jun 20 20:26:41 2024 +0900 > > @@ -287,14 +287,17 @@ > > > > ngx_table_elt_t *cache_control; > > ngx_table_elt_t *set_cookie; > > + ngx_table_elt_t *age; > > > > off_t content_length_n; > > time_t last_modified_time; > > + off_t age_n; > > See above, "off_t" looks wrong. I have changed it to time_t. I am sending updated two patches: * [PATCH 1 of 2] Cache: added calculation of the Age header * [PATCH 2 of 2] Tests: Update and add tests for Age header Could you take another look? Thank you, Hiroaki Nakamura