Re: Wrong etag sent with mod_deflate

2006-12-13 Thread Brian Akins

Henrik Nordstrom wrote:

But the unique identity of the response entity is defined by request-URI
+ ETag and/or Content-Location. The cache is not supposed to evaluate
Accept-* headers in determining the entity identity, only the origin
server.


However, on an initial request (ie, non-conditional) we do not have an etag from 
the client, we only have info like Host, URI, Accept-*, etc.  So, how would the 
cache identify which entity to serve in this case?



Please see RFC2616 13.6 Caching Negotiated Responses, it explains how
the RFC intends that caches should operate wrt Vary, ETag and
Content-Location in full detail.


I have read it many times.. In our case - cnn.com, etc. - we have to decided to 
be RFC compliant from the client to the cache server.  From the cache to the 
origin, however, we are not as concerned.  In a reverse-proxy-cache, this is not 
a big deal. However, in a normal forward-proxy-cache, where one does not 
control both cache and origin, one must be more careful.



--
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies


Re: Wrong etag sent with mod_deflate

2006-12-13 Thread Henrik Nordstrom
ons 2006-12-13 klockan 08:51 -0500 skrev Brian Akins:

 However, on an initial request (ie, non-conditional) we do not have an etag 
 from 
 the client, we only have info like Host, URI, Accept-*, etc.  So, how would 
 the 
 cache identify which entity to serve in this case?

You have the URL and the other cached entities of that URL. It does
not matter if the client request was a conditional or not. The
conditions in the request is on the response to see if it should be a
200 or 304, not selectors on what entity to respond with. The selected
response entity is always the same for the same request, with or without
conditions.

Obviously on the very first request for a given URL you have nothing,
and that request is forwarded without any added condition. However,
after that every Vary cache miss on that URL is a If-None-Match
conditional to ask the server if any of the cached entity variants is
applicable for the current request.

 I have read it many times.. In our case - cnn.com, etc. - we have to decided 
 to 
 be RFC compliant from the client to the cache server.  From the cache to 
 the 
 origin, however, we are not as concerned.

And you are free to. A reverse proxy is by definition the origin server.
How it finds the content is of no concern to the RFC, just happens to be
HTTP and not plain files, NFS, database or whatever.

 In a reverse-proxy-cache, this is not 
 a big deal. However, in a normal forward-proxy-cache, where one does not 
 control both cache and origin, one must be more careful.

Indeed.

But on the other hand it's actually reverse proxy configurations which
has pushed for 13.6 compliance in Squid as it's a lot easier for
processing intensive servers to evaluate If-None-Match than to render
the entity again, and when you depend on Accept-Language +
Accept-Encoding + User-Agent the number of request combinations becomes
quite significant, especially if there maybe only is two or three
variants under the URL.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-12 Thread Brian Akins

Henrik Nordstrom wrote:

mån 2006-12-11 klockan 14:25 -0500 skrev Brian Akins:

So, multiple variants of the same object can have the same Etag, but still be 
different cached objects.


Your implementation ignores RFC 2616 13.6 Caching Negotiated Responses,
but is otherwise fine. It's functionally compliant but not as effective
as it could be.


That was a simplified explanation, we actually do not store a cache entry for 
every single variant.  In our case the only thing we actually ever care about is 
whether or not you support gzip.  So all the variants for Vary: User-Agent, 
Accept-Encoding actually boil down to 2 variants - gzip or no-gzip.


One of the major reasons we quit using squid was it support for Vary's. (This 
was pre-3.0, so things may have changed). Of course, at the time httpd wasn't 
any better - but it was alot easier to hack ;)



Variants is
identified by ETag or Content-Location. Only if there is neither ETag or
Content-Location in the response entity then is the response entity
identified by the Vary request headers.

Only conditional requests from clients, generally, have If-None-Match headers. 
So the only way for a cache, on an initial request from a client, to determine 
what object to serve is to use the Client supplied information - which doesn't 
include an Etag, so you have to, usually, rely on URI first, and then the Vary 
information.



--
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies


Re: Wrong etag sent with mod_deflate

2006-12-12 Thread Henrik Nordstrom
tis 2006-12-12 klockan 09:20 -0500 skrev Brian Akins:

 Only conditional requests from clients, generally, have If-None-Match 
 headers. 

Correct. It's a conditional. These days you also see them from Squid
btw.

 So the only way for a cache, on an initial request from a client, to 
 determine 
 what object to serve is to use the Client supplied information - which 
 doesn't 
 include an Etag, so you have to, usually, rely on URI first, and then the 
 Vary 
 information.

Indeed. This is always the case. If-None-Match MUST NOT be used for
identification of which response to use. It's a conditional only.

But the unique identity of the response entity is defined by request-URI
+ ETag and/or Content-Location. The cache is not supposed to evaluate
Accept-* headers in determining the entity identity, only the origin
server.

The identity of the entity is important for

- Cache correctness, making sure updates invalidate cached copies where
needed.

- Avoiding duplicated storage

There may be any number of request header combinations in any Vary
dimensions all mapping to the same entity.

This logics is not at all unique for Accept-Encoding. The logics on how
a cache is supposed to operate applies equal to all Vary indicated
headers. The specs does not make any distinction between
Accept-Encoding, Accept-Language, User-Agent etc in how caches are
supposed to operate. It all boils down to the entity identified by URI +
ETag and/or Content-Location as returned in 200 and 304 responses
allowing the cache to map requests to entities.

Please see RFC2616 13.6 Caching Negotiated Responses, it explains how
the RFC intends that caches should operate wrt Vary, ETag and
Content-Location in full detail.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-11 Thread Brian Akins
This is not a response to any post on this subject, but more of a comment.  Here 
is a real world example of how we use deflate and etags with our cache. (Note 
this is very similar to mod_cache, but I do not know the inner workings of it as 
well).


1. Generate key from URI and ap_get_servername
2. open cached object.  Is it Vary? no, goto step 5.
3. Is Vary. Generate new key.
4. Open cached object.
5. Check expiry time, exit if expired.
6. Load headers.
7. Call ap_meets_conditions (etags, IMS, etc.)  If yes, return 304 (or 
whatever).
8. If not meets_conditions, serve from cache.

So, multiple variants of the same object can have the same Etag, but still be 
different cached objects.


This probably has no bearing on the current conversation, but perhaps I am not 
fully appreciating the core of the debate??


--
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread TOKILEY
Let me preface all comments by saying that I AGREE with BOTH
Roy and Henrik... If Apache is sending the same exact (strong)
ETag value for both a compressed and an identity variant of
the same entity... then, according to current RFC content, 
that is broken behavior and it should be fixed.

You can take the part of the RFC that talks specifically about
how Weak Etags might seem ideal for compressed variants and
argue that against Henrik's point of view that a compressed
variant should ALWAYS be treated as a separate (unique) HTTP
entity but I don't want to go there. Not now, anyway.

Personally I tend to agree with the concept that even if DCE is 
employed ( Dynamic Content Encoding ) that any code that is
doing DCE ( versus Transfer Encoding ) should make that 
dynamically generated entity appear as if it was simply
a disk-based (separate) resource. 

DCE is, after all, just a magic trick. It is making it
APPEAR to end-users as if compressed variants of entities
actually physically exist and are being sent back to 
anyone ready/able/willing to receive them...

...and it's a GOOD TRICK, when done correctly.

 Roy wrote

 In other words, Henrik has it right.  It is our responsibility to
 assign different etags to different variants because doing otherwise
 may result in errors on shared caches that use the etag as a variant
 identifier.

See above. Totally agree.

 Justin wrote...

 As Kevin mentioned, Squid is only using the ETag and is ignoring the
 Vary header. That's the crux of the broken behavior on their part.

 Roy wrote...

 Then they will still be broken regardless of what we do here.  It simply
 isn't a relevant issue.

It's relevant to the extent that I think there are still some things
missing from the RFCs with regards to all this which is why a piece
of software like SQUID might be doing the wrong thing as well.

Best way I could elaborate on that feeling is to just walk
through Roy's scenario...

 Roy wrote...

 Unlike Squid, RFC compliance is part of our mission, at least when
 it isn't due to a bug in the spec.  This is not a bug in the spec.

 A high-efficiency response cache is expected to have multiple
 representations of a given resource cached.  

No doubt.

 The cache key is the URI.  

Yes.

 If the set of varying header field values that 
 generated the cached response is different from the request set, 

...as when one browser asks for the a URI and sends 
Accept-encoding: gzip and another ask for the same URI
and does NOT supply Accept-encoding: gzip...

 then a conditional GET request is made containing ALL of the 
 cached entity tags in an If-None-Match field 
 (in accordance with the Vary requirements). 

...and, currently, if the cache has stored both a compressed and
and non-compressed version of the same entity received from Apache
( sic: mod_deflate ) then the same ( strong ) ETag is returned
in the conditional GET for both of the cached variants.

Hmmm... begins to look like a problem... but is it really?... 

 If the server says that any one of the representations,
 as indicated by the ETag in a 304 response, is okay, 

okay means fresh.

In the case of a DCE encoded variant, an argument could be made
here that it doesn't make a bit of difference if the ETag for
the compressed or non-compressed variant is the 'same' or it is
'different'. All the cache really wants to know is Is the
ORIGINAL ( uncompressed ) version of this response fresh
or not?

The compressed variant should ALWAYS be just the encoded version of 
the same original uncompressed entity. If the original uncompressed
version ( indicated by strong ETag 1 ) is not fresh then there is
no possible way for any compressed variant of the same entity
( marked by the same strong ETag 1 ) to be fresh. It's just
not possible.

So, in essence, when the Vary: has to do with just compression,
then the compressed and uncompressed variants are married in
a way that, perhaps, is not covered in the existing ETag RFC
specifications. The ETag CAN/SHOULD be the same because there
is no way for the original ( strong ETag ) to become not fresh 
without the other representation also becoming not fresh.

These kinds of Variants are Synced in a way perhaps not
( currently ) covered by the ETag specs.

 then the cached representation with that entity tag is sent to 
 the user-agent regardless of the Vary calculation.

sent to means the cache has received it's 304 response and 
decided what it CAN/SHOULD send back to the user, right?

Well... if you follow the argument above about how certain
variants are synced together then even if two variants on
the cache share the same strong ETag... then how can the
cache send back the wrong thing or NOT pay attention to
the Vary calculation on its end?

I don't know the exact details of the exact field problem
that Henrik is trying to solve but it seems to me that EVEN
THOUGH the compressed and non-compressed variants might
happen to share the same (strong) ETag... if SQUID is delivering

Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Ruediger Pluem


On 12/09/2006 06:52 AM, Roy T. Fielding wrote:

 
 The best solution is to not mess with content-encoding at all, which
 gets us out of both this consistency problem and related problems
 with the entity-header fields (content-md5, signatures, etc.).
 That is why transfer encoding was invented in the first place.
 
 We should have an implementation of deflate as a transfer encoding,
 but it should be configurable independent of the existing filter.
 Some people will want TE specifically to avoid the addition of Vary
 and all the other problems that content-changing filters cause.
 For example, an additional directive option for CE, TE, or either.

I think fixing the current CE filter is easier right now then to
add the option above. I think this can be done in a second step
and sounds like a good idea to me.

 
 The existing filter needs to modify the ETag field value (and
 any other entity-dependent values that we can think of) or be
 removed as a feature.  Weak etags are not a solution -- being able
 to make range requests of large cached representations requires a
 strong etag, and it really isn't hard to provide one.  It is better
 to not deflate the response at all than to interfere with caching.

Would the following patch address all your points for a CE mod_deflate filter?

Index: modules/filters/mod_deflate.c
===
--- modules/filters/mod_deflate.c   (Revision 484803)
+++ modules/filters/mod_deflate.c   (Arbeitskopie)
@@ -320,6 +320,7 @@
 if (!ctx) {
 char *token;
 const char *encoding;
+const char *etag;

 /* only work on main request/no subrequests */
 if (r-main != NULL) {
@@ -483,7 +484,26 @@
 else {
 apr_table_mergen(r-headers_out, Content-Encoding, gzip);
 }
+/*
+ * Unset headers which are no longer valid after we have compressed
+ * the content.
+ */
 apr_table_unset(r-headers_out, Content-Length);
+apr_table_unset(r-headers_out, Content-MD5);
+/* Adjust ETag if present */
+etag = apr_table_get(r-headers_out, ETag);
+if (etag) {
+if (*etag) {
+/* Remove the '' at the end of the ETag */
+etag[strlen(etag) - 1] = '\0';
+apr_table_set(r-headers_out, ETag,
+  apr_pstrcat(r-pool, etag, -gzip\, NULL));
+}
+else {
+/* Does not seem to be a valid ETag. So remove it. */
+apr_table_unset(r-headers_out, ETag);
+}
+}

 /* initialize deflate output buffer */
 ctx-stream.next_out = ctx-buffer;

Regards

Rüdiger



Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Justin Erenkrantz

On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote:

 The existing filter needs to modify the ETag field value (and
 any other entity-dependent values that we can think of) or be
 removed as a feature.  Weak etags are not a solution -- being able
 to make range requests of large cached representations requires a
 strong etag, and it really isn't hard to provide one.  It is better
 to not deflate the response at all than to interfere with caching.

Would the following patch address all your points for a CE mod_deflate filter?


No - this patch breaks conditional GETs which is what I'm against.

See the problem here is that you have to teach ap_meets_conditions()
about this.  An ETag of 1234-gzip needs to also satisfy a
conditional request when the ETag when ap_meets_conditions() is run is
1234.  In other words, ap_meets_conditions() also needs to strip
-gzip if it is present before it does the ETag comparison.  But, the
issue is that there is no real way for us to implement this without a
butt-ugly hack.

However, I disagree with Roy in that we most certainly *do* treat the
ETag values as opaque - Subversion has its own ETag values - Roy's
position only works if you assume the core is assigning the ETag value
which has a set format - not a third-party module.  IMO, any valid
solution that we deploy must work *independently* of what any module
may set ETag to.  It is perfectly valid for a 3rd-party module to
include -gzip at the end of their ETag.  For example, if you had a
file called foo-gzip in revision 10, SVN would assign the ETag
10//foo-gzip.  (And, I could construct a conflict where httpd would
hork the ETag incorrectly for any arbitrary value.)  -- justin


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Justin Erenkrantz

On 12/9/06, Roy T. Fielding [EMAIL PROTECTED] wrote:

The best solution is to not mess with content-encoding at all, which
gets us out of both this consistency problem and related problems
with the entity-header fields (content-md5, signatures, etc.).
That is why transfer encoding was invented in the first place.


We don't live in a world that uses Transfer Encoding for gzip.
Firefox, MSIE, and Opera don't support it.  So, dropping Content
Encoding support in mod_deflate is a non-starter.


We should have an implementation of deflate as a transfer encoding,
but it should be configurable independent of the existing filter.
Some people will want TE specifically to avoid the addition of Vary
and all the other problems that content-changing filters cause.
For example, an additional directive option for CE, TE, or either.


As I said earlier, mod_deflate could respond if TE is sent - there's
no need for a directive here.  And it can sidestep the ETag violation
there.  It's a trivial addition to the current filter of just a few
lines.  And, it gives the one cache in the world that doesn't support
Vary a way out.  So, I feel that this resolves the RFC violation that
Squid sees as long as it sends TE: gzip instead.


The existing filter needs to modify the ETag field value (and
any other entity-dependent values that we can think of) or be
removed as a feature.  Weak etags are not a solution -- being able
to make range requests of large cached representations requires a
strong etag, and it really isn't hard to provide one.  It is better
to not deflate the response at all than to interfere with caching.


As Rudiger's patch shows, removing the ETag or appending junk in
mod_deflate isn't enough - you have to teach ap_meets_conditions() how
to know what it is that it's looking at.  I'm against adding ugly
hacks there that make it only know how to handle -gzip.
(mod_deflate could in theory very well send deflate compression.)
So, any solution within ap_meets_conditions() needs to be generic and
not a one-off just for mod_deflate.


In any case, I won't accept anyone's votes on this issue until there
is a patch that can be voted on, and the technical considerations of
security and correctness take priority over other trade-offs.  RTC.


The patch you have been outlining is straightforward - but ultimately
broken because you haven't sketched a way to handle the
ap_meets_conditions() problem.  I'm merely informing you that I will
veto any approach that breaks conditional GETs with real browsers.  I
couldn't care less what a broken proxy cache does (especially if we
can give it way not to be broken) if it means that mod_deflate no
longer supports browser caches.  -- justin


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Ruediger Pluem


On 12/09/2006 03:23 PM, Justin Erenkrantz wrote:
 On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote:


 Would the following patch address all your points for a CE mod_deflate
 filter?
 
 
 No - this patch breaks conditional GETs which is what I'm against.

Ok, to be honest my question was more directed to Roy than to you, to
understand Roys ideas and plans from a patch level perspective. I
was pretty sure that you would not like it as you have expressed it clearly
before.

 
 See the problem here is that you have to teach ap_meets_conditions()
 about this.  An ETag of 1234-gzip needs to also satisfy a
 conditional request when the ETag when ap_meets_conditions() is run is
 1234.  In other words, ap_meets_conditions() also needs to strip
 -gzip if it is present before it does the ETag comparison.  But, the
 issue is that there is no real way for us to implement this without a
 butt-ugly hack.

Thanks for giving the pointer to ap_meets_conditions. So content compressed
by mod_deflate would not stand conditional requests based on ETags any longer.
That would be bad. Would it help if we simply unset the ETag in mod_deflate?
mod_filter does this in these situations or does this have any other nasty
side effects?

So what I understand from the current discussion is that

1. Using TE instead of CE would be RFC compliant and would relief us of
   much problems except the one that none of the major browsers can handle
   it and thus would effectively make mod_deflate useless.

2. There are two different points of view in the CE case:
   Roy and Henrik say that a strong ETag arriving at mod_deflate must
   be replaced with a different strong ETag within mod_deflate (e.g
   by adding -gzip to it), because as mod_deflate is doing CE the entities
   before and after mod_deflate are different and require different ETags.
   Justin OTH says that it is sufficient to convert a strong ETag into a
   weak one, right?

Regards

Rüdiger




Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Justin Erenkrantz

On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote:

Thanks for giving the pointer to ap_meets_conditions. So content compressed
by mod_deflate would not stand conditional requests based on ETags any longer.
That would be bad. Would it help if we simply unset the ETag in mod_deflate?
mod_filter does this in these situations or does this have any other nasty
side effects?


AIUI, many caches do not allow the response to be cached at all if it
doesn't have an ETag.  This is why it was brought up that not doing
deflate at all might be better in some cases than removing the ETag.


So what I understand from the current discussion is that

1. Using TE instead of CE would be RFC compliant and would relief us of
   much problems except the one that none of the major browsers can handle
   it and thus would effectively make mod_deflate useless.


Right.


2. There are two different points of view in the CE case:
   Roy and Henrik say that a strong ETag arriving at mod_deflate must
   be replaced with a different strong ETag within mod_deflate (e.g
   by adding -gzip to it), because as mod_deflate is doing CE the entities
   before and after mod_deflate are different and require different ETags.
   Justin OTH says that it is sufficient to convert a strong ETag into a
   weak one, right?


In the ideal world, I think a weak ETag would be the 'right' thing -
however, the current spec doesn't allow conditional GETs to work with
weak ETags.  Therefore, to allow conditional GETs, mod_deflate can
only produce strong ETags.  However, to make conditional GETs work and
to create a different ETag, the transformation has to be reversible -
which I believe may become a sticking point.

(BTW, I disagree with Roy and Henrik that the transformation that
mod_deflate is applying changes the actual meaning of the content; but
that's largely an irrelevant and academic point for this list.)

HTH.  -- justin


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Ruediger Pluem


On 12/09/2006 07:02 PM, Justin Erenkrantz wrote:
 On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote:
 
 Thanks for giving the pointer to ap_meets_conditions. So content
 compressed
 by mod_deflate would not stand conditional requests based on ETags any
 longer.
 That would be bad. Would it help if we simply unset the ETag in
 mod_deflate?
 mod_filter does this in these situations or does this have any other
 nasty
 side effects?
 
 
 AIUI, many caches do not allow the response to be cached at all if it
 doesn't have an ETag.  This is why it was brought up that not doing

AFAICS this is not the case for mod_cache. As long as at least one of
the following headers is present mod_cache can cache the response if
all other conditions needed are true:

ETag
Last-Modified
Expires

Regards

Rüdiger


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Henrik Nordstrom
fre 2006-12-08 klockan 15:35 -0800 skrev Justin Erenkrantz:

 As Kevin mentioned, Squid is only using the ETag and is ignoring the
 Vary header.  That's the crux of the broken behavior on their part.
 If they want to point out minor RFC violations in Apache, then we can
 play that game as well.  (mod_cache deals with this Vary/ETag case
 just fine, FWIW.)

We are not at all ignoring Vary, but we are using If-None-Match to ask
the server which one of the N already cached entities belonging to the
resource URI is valid for this specific request, indirectly learning the
server side content negotiation logics used.

 The compromise I'd be willing to accept is to have mod_deflate support
 the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding
 bit - and to prefer that over any Accept-Encoding bits that are sent.

Would be a great move if you can not make it behave correct in the
content space.

But if you make mod_deflate behave according to the RFC then sending
Content-Encoding: gzip is fine to me. But TE is a much better fit from
the RFC point of view.

 The ETag can clearly remain the same in that case - even as a strong
 ETag.

Yes.

 So, Squid can change to send along TE: gzip (if it isn't
 already).

TE: gzip is likely to appear in 3.1.

 And, everyone else who sends Accept-Encoding gets the
 result in a way that doesn't pooch their cache if they try to do a
 later conditional request.

As long as mod_deflate continues ignoring the RFC wrt ETag there will
conflicts with various cache implementations.

 Is that acceptable?  -- justin

Intentionally not following a MUST level requirements in the RFC is not
an acceptable solution in my eyes. For one thing even if you ignore
everyone else it would make it impossible for Apache + mod_deflate to
claim RFC 2616 HTTP/1.1 compliance.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Henrik Nordstrom
lör 2006-12-09 klockan 15:23 +0100 skrev Justin Erenkrantz:

 See the problem here is that you have to teach ap_meets_conditions()
 about this.  An ETag of 1234-gzip needs to also satisfy a
 conditional request when the ETag when ap_meets_conditions() is run is
 1234.  In other words, ap_meets_conditions() also needs to strip
 -gzip if it is present before it does the ETag comparison.  But, the
 issue is that there is no real way for us to implement this without a
 butt-ugly hack.

Be careful there.. Blindly stripping the decoration alone won't work
out. Consider for example If-None-Match. In specific If-None-Match with
the ETag of the gzip variant should only return 304 if the request would
cause Apache to send the gzip:ed variant of the entity.

If-None-Match: list of etags

returns 304 with the single correct ETag if any of the ETags in the
directive matches the current response to the current request.


Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Henrik Nordstrom
lör 2006-12-09 klockan 19:02 +0100 skrev Justin Erenkrantz:

 AIUI, many caches do not allow the response to be cached at all if it
 doesn't have an ETag.

Most still caches it, but for example Mozilla has bugs vrt Vary handling
if there is no ETag and the conditions changes..

 In the ideal world, I think a weak ETag would be the 'right' thing

I don't have an opinion if you return a strong or weak ETag, but it must
still be different than the ETag of the identity encoded object, not
just the same ETag flagged as weak.

Your main decision if the ETag on the mod_deflate generated entity
should be weak or strong should be

a) If the original entity is weak, then the mod_deflate generated one
MUST be weak as well..

b) If mod_deflate can not be trusted to generate the exact same octet
representation on each request then the ETag of the generated entity
MUST be weak.

Else the ETag SHOULD be strong.

 however, the current spec doesn't allow conditional GETs to work with
 weak ETags.

Err.. Weak ETags is allowed in If-None-Match for GET/HEAD.

Regards
Henrik




signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Henrik Nordstrom
fre 2006-12-08 klockan 15:40 -0800 skrev Justin Erenkrantz:

 I think we all (hopefully) agree that a weak ETag is ideally what
 mod_deflate should add.

Please read RFC2616 13.6 Caching Negotiated Responses for an in-depth
description of how caches should handle Vary. And please stop lying
about Squid. If you think something in our cache implementation of
Vary/ETag is not right then say what and back it up with RFC reference.

My base requirement is that you comply with If-None-Match. For this you
MUST return a different ETag. It does not matter to me if it's weak or
strong as the main concerns for a cache is GET/HEAD requests. Flagging
the existing ETag as weak does not make it a different ETag as
If-None-Match on GET/HEAD allows for the weak comparison function where
weakness is ignored.

13.3.3 Weak and Strong Validators

  - The weak comparison function: in order to be considered equal,
both validators MUST be identical in every way, but either or
both of them MAY be tagged as weak without affecting the
result.


Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Henrik Nordstrom
lör 2006-12-09 klockan 05:44 -0500 skrev [EMAIL PROTECTED]:

 It's relevant to the extent that I think there are still some things
 missing from the RFCs with regards to all this which is why a piece
 of software like SQUID might be doing the wrong thing as well.

Ater reading the RFC on this topic many many times I can not agree that
it's that incomplete.

The scheme set by the RFC is quite complete as long as you stay with
strong ETags, allowing for cache correctness, update serialization and
many good things.

Situations requiring weak etags also works out pretty well for cache
correctness thanks to If-None-Match, but not other operations as they
are banned from both non-GET/HEAD requests and If-Match conditions.
  
 ...and, currently, if the cache has stored both a compressed and
 and non-compressed version of the same entity received from Apache
 ( sic: mod_deflate ) then the same ( strong ) ETag is returned
 in the conditional GET for both of the cached variants.
  
 Hmmm... begins to look like a problem... but is it really?... 

It is.

See 13.6 Caching Negotiated Responses (all of it). And then skim over
14.26 If-None-Match, and finally read 10.3.5 304 Not Modified. Then
piece them together.

Also take note that nowhere is there any requirement on the cache to
evaluate any server driven content negotiation inputs (Accept-XXX etc).
This responsibility is fully at the origin server and reflected back via
ETag.

Caches evaluate Vary in finding the correct response entity.

  If the server says that any one of the representations,
  as indicated by the ETag in a 304 response, is okay, 
  
 okay means fresh.

Not only that, it also tells which entity among the N cached ones is
valid to send as response to this request.

 happen to share the same (strong) ETag... if SQUID is delivering
 stale compressed variants when a 304 response says that the
 original identity variant is not fresh then that's just
 a colossal screw-up in the caching code itself.

The 304 says

Send the entity with the ETag XXX, its still fresh. Nothing more. If
does not indicate if this is a identiy of gzip encoded, neither the
content length, content type or anything other relevant to the actual
content besides the ETag and/or Content-Location.
 
 Regardless of what the server says... how could you ever get
 into a situation where you would consider a compressed variant
 of an entity fresh when the identity version is now stale? 

As HTTP did not consider dynamic content encoding it sees the two
entities as different objects (i.e. file and file.gz) and does not
enforce a strict synchronization between the two. The only requirement
set in the RFC is that the origin server SHOULD make sure the two
representations on the server is in synch.

 is seriously confused even if the ETags are the same and the
 cache is sending back stale compressed variants when the
 identity variant ( strong ETag value ) is also stale. 

I don't know what condition you refer to here. the Squid cache (2.6)
only remembers the last seen of the two as the later response with the
same ETag overwrites the first..

 There's still something missing from the specs or something.

Not that I can tell.
 
 When an exact, literal interpretation of a spec tends to 
 defy common sense... my instinct is to suspect the spec itself.

In what way? There is something in your reasoning I don't get.
  
 DCE ( Dynamic Content Encoding ) is a valid concept even if it
 wasn't sufficiently imagined at the time the specs were
 codified. It works. It works WELL... and it is something that
 OUGHT to always be possible if the RFCs mean anything at all.

And it is possible. Just that you need to pay attention to

  Content-Location
  ETag
  Content-MD5

as all of these is affected by dynamically altering the entity by server
driven content negotiation with static or dynamic recoding of the
entity.

 One of the main prime directives for developing Apache 2.0
 at all was to finally re-org the IO stream so that schemes
 like DCE could be done more easily than were already being
 done in the 1.3.x framework. Mission was accomplished.
 Filtering was born. It would be a shame to consider abandoning
 one of the very concepts that gave birth to Apache 2.0 for 
 the sake of a few more lines of code that could take it
 into the end zone.

Agreed.
 
 No argument here. Transfer-encoding is about a DECADE overdue now.

And as already indicated should be piece of cake to add to mod_deflate,
and as HTTP support evolves in clients and caches is likely to lessen
the complexity of dealing with mod_deflate and conditionals
considerably.
 
 In the case of compressed entities it would still be a good idea
 to always add a standard header which indicates the original
 uncompressed content-length ( if it's possible to know it ).

There is no such header in HTTP, but you are free to propose one. But
it's worth noting that this information also exists in the gzip
encoding.

Current specs does not handle 

Re: Wrong etag sent with mod_deflate

2006-12-09 Thread TOKILEY
 Justin wrote...

 No - this patch breaks conditional GETs which is what I'm against.
 
 See the problem here is that you have to teach ap_meets_conditions()
 about this.  An ETag of 1234-gzip needs to also satisfy a
 conditional request when the ETag when ap_meets_conditions() is run is
 1234.  In other words, ap_meets_conditions() also needs to strip
 -gzip if it is present before it does the ETag comparison.  But, the
 issue is that there is no real way for us to implement this without a
 butt-ugly hack.

 However, I disagree with Roy in that we most certainly *do* treat the
 ETag values as opaque - Subversion has its own ETag values - Roy's
 position only works if you assume the core is assigning the ETag value
 which has a set format - not a third-party module.  IMO, any valid
 solution that we deploy must work *independently* of what any module
 may set ETag to.  It is perfectly valid for a 3rd-party module to
 include -gzip at the end of their ETag.  

...or -bzip2.

mod_bzip2 has been working fine for almost a year now and presents the
same issue Justin is talking about here.

It (can) generate it's own ETag values, if you want it to ( configurable ),
and ap_meets_conditions isn't going to know what to strip or not strip.

Yours
Kevin


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread TOKILEY
 And please stop lying about Squid.

C'mon Henrik. No one is intentionally trying to LIE about Squid.

If you are referring to Justin quoting ME let me supply a big
fat MEA CULPA here and say right now that I haven't looked
at the SQUID Vary/ETag code since the last major release
and I DO NOT KNOW FOR SURE what SQUID is doing ( or
not doing ) if/when it sees the same (strong) ETag for both
a compressed and an identity version of the same entity.

Period. I DO NOT KNOW FER SURE.

I should have made that perfectly clear along with any
opinion previously offered.

I apologize for that.

I also DID already state clearly in another post...

 I don't know the exact details of the exact field problem
 that Henrik is trying to solve...

Keyphrase --don't know the exact details

In my other posts, I was suggesting, however, that even if
an upstream content server ( Apache ) is not sending separate
unique ETags I am still having a hard time understanding why
that would cause SQUID to deliver the wrong Varied response
back to the user.

Something is nagging at me telling me that EVEN IF the same
(strong) ETag happens to be on both a compressed and a
non-compressed version of the SAME ENTITY that there 
shouldn't be a big problem in the field ( sic: A user not
getting what they asked for ). 

A compressed version of an entity IS the same entity... for
all intents and purposes... it just has compression 
applied. One cannot possibly become stale without the
other also being stale at the same exact moment in time.

 If you think something in our cache implementation of
 Vary/ETag is not right then say what and back it up with RFC reference.

At the moment... yes... I do... but if you read my other posts I
also have a feeling the reason I can't quote you Verse and Chapter
from an RFC is because I have a sneaking suspicion that there
is something missing from the ETag/Vary scheme that can 
lead to problems like this... and it's NOT IN ANY RFC YET.

It has something to do with being too literal about a spec and
ignoring common sense.

In other words... you may be doing exactly what hours and hours
of reading an RFC seems to be telling you you SHOULD do... but
there still might be something else that OUGHT to be done.

I hope the discussion continues.

This is something that has been lurking for years now and it
needs to get resolved.

There will always be the chance that some upstream server will
( mistakenly? ) keep the same (strong) ETag on a compressed
variant. People are not perfect and they make mistakes. I still
think that even when that happens any caching software should
follow the be lenient in what you accpet and strict in what you
send rule and still use the other information available to it
( sic: What the client really asked for and expects ) and 
do the right thing. Only the cache knows what the client
is REALLY asking for.

Yours...
Kevin


Re: Wrong etag sent with mod_deflate

2006-12-09 Thread Henrik Nordstrom
lör 2006-12-09 klockan 20:38 -0500 skrev [EMAIL PROTECTED]:

 If you are referring to Justin quoting ME let me supply a big
 fat MEA CULPA here and say right now that I haven't looked
 at the SQUID Vary/ETag code since the last major release
 and I DO NOT KNOW FOR SURE what SQUID is doing ( or
 not doing ) if/when it sees the same (strong) ETag for both
 a compressed and an identity version of the same entity.

Thats not the problem. The problem is that Apache tells us that we
should use whatever we got first on all subsequent responses.

The chain of events leading to the problem is as follows:

1. We forward request A. Lets say this claims Accept-Encoding: gzip.

2. Apache mod_deflate returns an gzip:ed entity with ETag
6bf1f7-6-1b6d6340 and Vary: Accept-Encoding.

3. We get another request with a different Accept-Encoding value. This
gets forwarded to Apache with an If-None-Match header telling the ETags
of the entities we have, i.e. If-None-Match 6bf1f7-6-1b6d6340.

4. The entity hasn't changed and Apache responds with a 304 ETag
6bf1f7-6-1b6d6340 telling us that the valid response entity for this
request is the previous received response with ETag 6bf1f7-6-1b6d6340,
and any updated HTTP headers for that response.

The problem arises in '4'.

 Period. I DO NOT KNOW FER SURE.

Then stop saying that Squid is broken, does not implement X or broken
clients such as Squid. All I ask. Fine to say that you do not understand
why it is a problem for Squid.

 In my other posts, I was suggesting, however, that even if
 an upstream content server ( Apache ) is not sending separate
 unique ETags I am still having a hard time understanding why
 that would cause SQUID to deliver the wrong Varied response
 back to the user.

Simply because Apache explicitly tells it do exactly that in it's 304
response.

 A compressed version of an entity IS the same entity...

Nope. It's a different representation of the the same resource, but not
the same entity in terms of HTTP. This is the key difference between
Content-Encoding and Transfer-Encoding.

Content-Encoding is a property of the entity.

Transfer-Encoding is a property of how the message is sent, just like
chunked, with no implications on the entity.

The problem arises from trying to use Content-Encoding as if it was
Transfer-Encoding.

Many years ago we had the same discussion about Vary, and when dust
settled all understood the problem about not sending correct Vary in the
responses. Now as the cache implementation is evolving we are hitting
the exact same problem again in a different form this time due to ETag
collisions. I am sorry that we did not realize the full extent of the
brokenness of these responses the first time when Vary was discussed.

 for
 all intents and purposes... it just has compression 
 applied. One cannot possibly become stale without the
 other also being stale at the same exact moment in time.

HTTP does not make this strict freshness relation between entities of
the same URI, but thats a different question and generally not a big
problem.

 At the moment... yes... I do... but if you read my other posts I
 also have a feeling the reason I can't quote you Verse and Chapter
 from an RFC is because I have a sneaking suspicion that there
 is something missing from the ETag/Vary scheme that can 
 lead to problems like this... and it's NOT IN ANY RFC YET.

And what I am saying is that Apache mod_deflate is violating a MUST
level requirement on ETag in the RFC, thereby making the caching section
of the same RFC break down.

 In other words... you may be doing exactly what hours and hours
 of reading an RFC seems to be telling you you SHOULD do... but
 there still might be something else that OUGHT to be done.

And I am telling you that this part of the RFC is complete, save for the
small detail that the server can not signal that both the compressed and
identity encoding becomes stale when one changes, only one at a time.

 There will always be the chance that some upstream server will
 ( mistakenly? ) keep the same (strong) ETag on a compressed
 variant.

True, there will always be non-compliant implementation out there in
various forms, and they will continue causing problems at least for as
long as it's about MUST level violations. In many cases (this one
included) workarounds can be found, but that does not justify the ones
being non-compliant to continue and intentionally being non-compliant
when informed about the problem.

 People are not perfect and they make mistakes. I still
 think that even when that happens any caching software should
 follow the be lenient in what you accpet and strict in what you
 send rule and still use the other information available to it

Which in this case is none. The only information we ever get from Apache
is the ETag of the supposedly valid to use response, and possibly new
freshness details about the same.

 ( sic: What the client really asked for and expects ) and 
 do the right thing. Only the cache knows 

Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Justin Erenkrantz

On 12/8/06, Henrik Nordstrom [EMAIL PROTECTED] wrote:

No, that won't work. You still be just as non-conforming by doing that.
But if mod_deflate may to produce different octet-level results on
different requests for the same original entity then it must do this in
addition to other transforms of the ETag.

The identity and gzip encodings is not bidirectionally semantically
equivalent, and additionally normal conditional comparing W/X to X
is true.


Uh, no, they *are* semantically equivalent - but, yes, not
syntactically (bit-for-bit) equivalent.  You inflate the response and
you get exactly what the ETag originally represented.


See RFC 2616 3.3.3 Weak and Strong Validators

You must make the value of the ETag differ between the two entities.


mod_deflate is clearly only doing a semantic (weak) transformation.  -- justin


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Justin Erenkrantz

On 12/8/06, Henrik Nordstrom [EMAIL PROTECTED] wrote:

The protocol is quite fine as it is, and not easy to change. As it is
now it's mainly a matter of understanding that mod_deflate does create a
completely new entity from the original one. To the protocol it's
exactly the same as when using mod_negotiate and having both the
identity and gzip encoded entities on disk. The fact that you do this
encoding on the fly is of no concern to HTTP.


mod_deflate is certainly not creating a new resource - it's modifying
the representation.  There is no legitimate reason for it to modify
the ETag other than to mark it as weak.  That the caching bits in the
RFC didn't understand this speak to the fact that it's quite subtle.
-- justin


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Henrik Nordstrom
fre 2006-12-08 klockan 14:47 +0100 skrev Justin Erenkrantz:

 mod_deflate is certainly not creating a new resource

It is creating a new HTTP entity. Not a new object on your server, but
still a new unique HTTP entity with different characteristics from the
identity encoding.

If we were talking about transfer-encoding then you would be correct as
it only alter the encoding for transfer purposes and not the HTTP entity
as such, but this is content-encoding. Content encoding is a property of
the response entity.

The main reason why things get blurred is because the creation of this
entity is done on the fly instead of creating a new resource on the
server like HTTP expects. As result you need to be very careful with the
ETag and Content-Location headers.

Not modifying ETag (including just making it weak) says that the
identity and gzip encodings is semantically equivalent, and can be
exchanged freely. In other words says it's fine to send gzip encoding to
all clients (which we all know it's not).

Not modifying/removing Content-Location is less harmful but will cause
cache bouncing, as each time the cache sees a new response entity for a
given URI any older ones with the same Content-Location will get removed
from the cache.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Henrik Nordstrom
fre 2006-12-08 klockan 14:40 +0100 skrev Justin Erenkrantz:

 Uh, no, they *are* semantically equivalent - but, yes, not
 syntactically (bit-for-bit) equivalent.  You inflate the response and
 you get exactly what the ETag originally represented.

To entities is only semantically equivalent if they can be interchanged
freely at the HTTP level with no semantic difference in the end-user
result.

identiy and gzip encoding can not be said to bidirectionally have the
same semantic meaning as a gzip encoded entity is pure rubbish to a
recipient not understanding gzip. No more than a Swedish translation of
a document could be said to be semantically equivalent to a Greek
translation of the same document.

Content-Encoding is a case of unidirectional semantic equivalence where
the identity encoding can be substituted for the gzip encoding with kept
semantics, but for ETag bidirectional semantic equivalence is required
which is not fulfilled as gzip encoding can not be substituted for
identity encoding without risking a significant semantic difference to
the recipient.

The only real difference of a weak etag compared to a strong one is that
the weak one does not guarantee octet equality. All other restrictions
apply. Plus a bunch of protocol restrictions where weak etags is not
allowed to be used.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Henrik Nordstrom
tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz:

 -1 on adding semantic junk to the existing ETag (and keeping it
 strong); that's blatantly uncool.  Any generated ETag from mod_deflate
 should either be the original strong version or a weak version of any
 previous etag.  mod_deflate by *definition* is just creating a weak
 version of the prior entity.

You basically only have two choices:

a) Make mod_deflate not send an ETag on modified responses.

b) Modify the value (within the quotes) of the ETag somehow. And if
mod_deflate can not be trusted to always return the same octet
representation make sure to use an weak ETag unless the ETag generation
is also tightly coupled to the octet representation guaranteing a
different ETag should mod_deflate encode slightly different.

And to be fully compliant you also need to pay attention to the
Content-Location header. Here I don't see much choice but to not send
Content-Location in mod_deflate mangled responses (but can be kept on
the original response, no problem there).

RFC 2616 13.6 Caching Negotiated Responses, last paragraph.

 mod_deflate does properly stick in the Vary header, so caches already
 have enough knowledge to know what's going on anyway even without a
 fix.  (This is probably why mod_cache doesn't flag it as an error.)

 My opinion is to fix the protocol and move on...  -- justin

The protocol is quite fine as it is, and not easy to change. As it is
now it's mainly a matter of understanding that mod_deflate does create a
completely new entity from the original one. To the protocol it's
exactly the same as when using mod_negotiate and having both the
identity and gzip encoded entities on disk. The fact that you do this
encoding on the fly is of no concern to HTTP.

Another option is to explore the use gzip transfer encoding instead of
content encodin. In transfer encoding none of these problems apply as
it's done on the transport level and not entity level, but it's not that
well supported in clients unfortunately..

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Henrik Nordstrom
fre 2006-12-08 klockan 14:40 +0100 skrev Justin Erenkrantz:

 Uh, no, they *are* semantically equivalent - but, yes, not
 syntactically (bit-for-bit) equivalent.  You inflate the response and
 you get exactly what the ETag originally represented.

To entities is only semantically equivalent if they can be interchanged
freely at the HTTP level with no semantic difference in the end-user
result.

identiy and gzip encoding can not be said to bidirectionally have the
same semantic meaning as a gzip encoded entity is pure rubbish to a
recipient not understanding gzip. No more than a Swedish translation of
a document could be said to be semantically equivalent to a Greek
translation of the same document.

Content-Encoding is a case of unidirectional semantic equivalence where
the identity encoding can be substituted for the gzip encoding with kept
semantics, but for ETag bidirectional semantic equivalence is required
which is not fulfilled as gzip encoding can not be substituted for
identity encoding without risking a significant semantic difference to
the recipient.

The only real difference of a weak etag compared to a strong one is that
the weak one does not guarantee octet equality. All other restrictions
apply. Plus a bunch of protocol restrictions where weak etags is not
allowed to be used.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Roy T. Fielding
Argh, my stupid ISP is losing apache email again because they use  
spamcop.


On Dec 7, 2006, at 2:45 PM, Henrik Nordstrom wrote:

tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz:


-1 on adding semantic junk to the existing ETag (and keeping it
strong); that's blatantly uncool.  Any generated ETag from  
mod_deflate

should either be the original strong version or a weak version of any
previous etag.  mod_deflate by *definition* is just creating a weak
version of the prior entity.


No, it is changing the content-encoding value, which is changing the
entity.  The purpose of etag for caching is two-fold: 1) for freshness
checks, and 2) handling conditional range/authoring requests.  That is
why the spec is full of gobbledygook on etag handling -- it was
stretched at the last minute to reuse a very simple freshness check as
a form of variant identifier.

What we should be doing is sending transfer-encoding, not content- 
encoding,

and get past the chicken and egg dilemma of that feature in HTTP.
If we are changing content-encoding, then we must behave as if there
are two different files on the server representing the resource.
That means tweaking the etag and being prepared to handle that tweak
on future conditional requests.

In other words, Henrik has it right.  It is our responsibility to
assign different etags to different variants because doing otherwise
may result in errors on shared caches that use the etag as a variant
identifier.

Roy


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread TOKILEY
 In other words, Henrik has it right.  It is our responsibility to
 assign different etags to different variants because doing otherwise
 may result in errors on shared caches that use the etag as a variant
 identifier.

Henrik is trying to make it sound like it is all Apache's fault.
It is not.
SQUID is screwing up, too.

...shared caches that use the etag as a variant identifier.

To ONLY ever use ETag as a the end-all-be-all for variant 
identification is, itself, a mistake.

If the Vary: field is present... then THAT is what the entity
(also) Varies: on and to ignore that and only rely on ETag
is a screw-up.

I had this argument years ago with folks at the SQUID forum.
It was just prior to when they ( finally ) got around to adding any
support for Vary: at all but (limited) support for ETag:.

Regardless of whether it's DCC ( Dynamic Content-Encoding )
or not... if the entity Varies: on Content-encoding: but some
cache software is ignoring that just because it's ETag matches
some other stored variant... well... that's just WRONG.

Both pieces of software ( SQUID and Apache ) need just a 
little more code to finally get it right.

Don't forget about Content-Length, either. 
If 2 different responses for the same requested entity come
back with 2 different Content-Lengths and there is no Vary:
or ETag then regardless of any other protocol semantics the 
only SANE thing for any caching software to do is to recoginze 
that, assume it is not a mistake, and REPLACE the existing 
entity with the new one.

Yea.. sure... you might get a lot of cache bounce that way but
at least you are returning a fresh copy.

It is not possible for 2 EXACTLY identical reprsentations of the
same requested entity to have different content lengths.
If the lengths are different, then SOMETHING is different with
regards to what you have in your cache.

To ignore that reality as well ( which most caching software
does ) is just kinda stupid.

No protocol ( sic: set of rules ) can ever cover all the realities.
( Good ) software knows how to make common sense
as well.

Yours...
Kevin Kiley

 

In a message dated 12/8/2006 11:45:44 AM Pacific Standard Time, 
[EMAIL PROTECTED] writes:
Argh, my stupid ISP is losing apache email again because they use  
spamcop.

On Dec 7, 2006, at 2:45 PM, Henrik Nordstrom wrote:
 tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz:

 -1 on adding semantic junk to the existing ETag (and keeping it
 strong); that's blatantly uncool.  Any generated ETag from  
 mod_deflate
 should either be the original strong version or a weak version of any
 previous etag.  mod_deflate by *definition* is just creating a weak
 version of the prior entity.

No, it is changing the content-encoding value, which is changing the
entity.  The purpose of etag for caching is two-fold: 1) for freshness
checks, and 2) handling conditional range/authoring requests.  That is
why the spec is full of gobbledygook on etag handling -- it was
stretched at the last minute to reuse a very simple freshness check as
a form of variant identifier.

What we should be doing is sending transfer-encoding, not content- 
encoding,
and get past the chicken and egg dilemma of that feature in HTTP.
If we are changing content-encoding, then we must behave as if there
are two different files on the server representing the resource.
That means tweaking the etag and being prepared to handle that tweak
on future conditional requests.

In other words, Henrik has it right.  It is our responsibility to
assign different etags to different variants because doing otherwise
may result in errors on shared caches that use the etag as a variant
identifier.

Roy


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Henrik Nordstrom
fre 2006-12-08 klockan 15:03 -0500 skrev [EMAIL PROTECTED]:

 To ONLY ever use ETag as a the end-all-be-all for variant 
 identification is, itself, a mistake.

Well, this area of the  HTTP specs is pretty clear in my eyes, but then
I have read it up and down too many times unwinding the tangled web
which is found in there.

An entity (including encoding) is identified by request URI +
Content-Location.

A specific version of a entity is identified by it's unique ETag.

Vary: tells which headers the server used in server driven negotiation
of which entity to respond with. Accept-Encoding is one input to this.

A strong ETag must be unique among all variants of a given URI, that is
all different forms of entities that may reside under the URI and all
their past and future versions.

A weak ETag may be shared by two variants/versions if and only if they
can be considered semantically equivalent and mutually exchangeable at
the HTTP level with no semantic loss. For example different levels of
compression, or minor changes of negligible or no importance to the
semantics of the resource (hit counter example in the specs).
 
 Both pieces of software ( SQUID and Apache ) need just a 
 little more code to finally get it right.

It's correct that the current Squid implementation is not flawless. Most
notably it has very poor handling of cache invalidations at the moment.
 
 Don't forget about Content-Length, either. 
 If 2 different responses for the same requested entity come
 back with 2 different Content-Lengths and there is no Vary:
 or ETag then regardless of any other protocol semantics the 
 only SANE thing for any caching software to do is to recoginze 
 that, assume it is not a mistake, and REPLACE the existing 
 entity with the new one.

Caches tend to by nature replace what they have with what they get.

 Yea.. sure... you might get a lot of cache bounce that way but
 at least you are returning a fresh copy.

How would Content-Length changes cause cache bouncing?

 It is not possible for 2 EXACTLY identical reprsentations of the
 same requested entity to have different content lengths.
 If the lengths are different, then SOMETHING is different with
 regards to what you have in your cache.

Yes, but when would this be seen?

We only get the ETag from Apache, not the Content-Length. Specs forbids
Apache from sending the Content-Length or other entity headers in 304
responses partly to make sure entities do not get corrupted by errors in
the origin server side implementation of server driven content
negotiation.

 No protocol ( sic: set of rules ) can ever cover all the realities.
 ( Good ) software knows how to make common sense
 as well.

Indeed and is why we are going slow on implementing the more advanced
features of the specs. But violating MUST level protocol requirements is
not common sense. And if you actually follow the specs these parts do
make great sense once you get the picture that ETags MUST be unique for
all entity versions of a given URI. The only poor part I have seen in
this area of the specs is that the If-None-Match condition is perhaps a
bit blunt only telling the end results, the ETag of the valid response
entity of a negotiated resource, not how the server came to that
conclusion. This adds a bit more roundtrips to the origin than would be
required only to figure out that Content-Language: en is ok both for
Accept-Language: en and Accept-Language: en, sv, but thats about it.
(yes, I intentioanlly avoided Accept-Encoding here to illustrate the
point, the mechanism is the exact same however).

RFC 2616 3.11 Entity Tags

   A strong entity tag MAY be shared by two entities of a resource
   only if they are equivalent by octet equality.

   An entity tag MUST be unique across all versions of all entities
   associated with a particular resource. A given entity tag value MAY


See also 14.26 If-None-Match, and numerous other references to ETag.

I can bombard you with long chains of supporting claims from the RFC if
you like depending on which parts of the equation you feel is loosely
connected. Just tell me which part you don't trust and I'll happily help
you see the light.

a) That identity and gzip content-encoding of the same resource
represents different entities of the same resource

b) That different entities of the same resource MUST have different
(strong) ETags.

c) That gzip and identity encoding is not semantically equivalent.

d) That the weak ETag W/X is semantically equivalent to the strong
ETag X with the same quoted value.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Henrik Nordstrom
fre 2006-12-08 klockan 11:44 -0800 skrev Roy T. Fielding:

 In other words, Henrik has it right.  It is our responsibility to
 assign different etags to different variants because doing otherwise
 may result in errors on shared caches that use the etag as a variant
 identifier.

Thanks ;-)

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Henrik Nordstrom
fre 2006-12-08 klockan 22:28 +0100 skrev Henrik Nordstrom:

 A strong ETag must be unique among all variants of a given URI, that is
 all different forms of entities that may reside under the URI and all
 their past and future versions.

Forgot the last piece there which clears many doubts:

Entities from different URIs may share the same ETag (or even
Content-Location) with no implications on any form of equivalence
between the two.

Also I am sorry that my use of terms is a bit messed up wrt entity vs
variant vs version, but so is the specs..

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Justin Erenkrantz

On 12/8/06, Roy T. Fielding [EMAIL PROTECTED] wrote:

What we should be doing is sending transfer-encoding, not content-
encoding,
and get past the chicken and egg dilemma of that feature in HTTP.
If we are changing content-encoding, then we must behave as if there
are two different files on the server representing the resource.
That means tweaking the etag and being prepared to handle that tweak
on future conditional requests.


There's just no way to know how to handle any ETag modification on
future requests.  So, that's a non-starter.  Therefore, any fix for
this edge case which breaks cacheability in the common case of real
browsers I would find unacceptable.


In other words, Henrik has it right.  It is our responsibility to
assign different etags to different variants because doing otherwise
may result in errors on shared caches that use the etag as a variant
identifier.


As Kevin mentioned, Squid is only using the ETag and is ignoring the
Vary header.  That's the crux of the broken behavior on their part.
If they want to point out minor RFC violations in Apache, then we can
play that game as well.  (mod_cache deals with this Vary/ETag case
just fine, FWIW.)

The compromise I'd be willing to accept is to have mod_deflate support
the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding
bit - and to prefer that over any Accept-Encoding bits that are sent.
The ETag can clearly remain the same in that case - even as a strong
ETag.  So, Squid can change to send along TE: gzip (if it isn't
already).  And, everyone else who sends Accept-Encoding gets the
result in a way that doesn't pooch their cache if they try to do a
later conditional request.

Is that acceptable?  -- justin


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Justin Erenkrantz

On 12/8/06, Henrik Nordstrom [EMAIL PROTECTED] wrote:

A strong ETag must be unique among all variants of a given URI, that is
all different forms of entities that may reside under the URI and all
their past and future versions.

A weak ETag may be shared by two variants/versions if and only if they
can be considered semantically equivalent and mutually exchangeable at
the HTTP level with no semantic loss. For example different levels of
compression, or minor changes of negligible or no importance to the
semantics of the resource (hit counter example in the specs).


I think we all (hopefully) agree that a weak ETag is ideally what
mod_deflate should add.  But, the specs simply dropped the ball here
as doing that breaks conditional requests.  If we could issue a weak
ETag and have it work for conditional requests, this would be easy and
be done by now.

We can't, so I would much prefer that we don't break conditional
requests just because mod_deflate is in use.  I also don't believe we
can come up with a reversible ETag semantic without rewriting big
chunks of code or introducing butt-ugly hacks.  Apache has always
treated the ETag as opaque (except for W/) - to do otherwise is to
bust large assumptions.  -- justin


Re: Wrong etag sent with mod_deflate

2006-12-08 Thread Roy T. Fielding

On Dec 8, 2006, at 3:35 PM, Justin Erenkrantz wrote:


On 12/8/06, Roy T. Fielding [EMAIL PROTECTED] wrote:

What we should be doing is sending transfer-encoding, not content-
encoding,
and get past the chicken and egg dilemma of that feature in HTTP.
If we are changing content-encoding, then we must behave as if there
are two different files on the server representing the resource.
That means tweaking the etag and being prepared to handle that tweak
on future conditional requests.


There's just no way to know how to handle any ETag modification on
future requests.  So, that's a non-starter.  Therefore, any fix for
this edge case which breaks cacheability in the common case of real
browsers I would find unacceptable.


It isn't necessary to handle any ETag modification -- our ETag
generation is fairly limited and is not opaque to the server.
We only need to avoid conflicts between the content-encoded variant
and the non-encoded variant, which is guaranteed if the encoded
variant has -gzip appended to the existing entity-tag.  That will
work fine with the common case of real browsers -- far better than
the current case which will deliver invalid content if a browser
tries to complete a partial download from a cache.


In other words, Henrik has it right.  It is our responsibility to
assign different etags to different variants because doing otherwise
may result in errors on shared caches that use the etag as a variant
identifier.


As Kevin mentioned, Squid is only using the ETag and is ignoring the
Vary header.  That's the crux of the broken behavior on their part.


Then they will still be broken regardless of what we do here.  It simply
isn't a relevant issue.


If they want to point out minor RFC violations in Apache, then we can
play that game as well.  (mod_cache deals with this Vary/ETag case
just fine, FWIW.)


Unlike Squid, RFC compliance is part of our mission, at least when
it isn't due to a bug in the spec.  This is not a bug in the spec.

A high-efficiency response cache is expected to have multiple
representations of a given resource cached.  The cache key is the
URI.  If the set of varying header field values that generated the
cached response is different from the request set, then a
conditional GET request is made containing ALL of the cached
entity tags in an If-None-Match field (in accordance with the Vary
requirements).  If the server says that any one of the representations,
as indicated by the ETag in a 304 response, is okay, then the cached
representation with that entity tag is sent to the user-agent
regardless of the Vary calculation.  In short, if we have two active
representations that have the same etag, then we have violated the
spec and created an unnecessary interoperability problem:

   If the selecting request header fields for the cached entry do not
   match the selecting request header fields of the new request, then
   the cache MUST NOT use a cached entry to satisfy the request unless
   it first relays the new request to the origin server in a  
conditional
   request and the server responds with 304 (Not Modified),  
including an

   entity tag or Content-Location that indicates the entity to be used.

   If an entity tag was assigned to a cached representation, the
   forwarded request SHOULD be conditional and include the entity tags
   in an If-None-Match header field from all its cache entries for the
   resource. This conveys to the server the set of entities currently
   held by the cache, so that if any one of these entities matches the
   requested entity, the server can use the ETag header field in its  
304
   (Not Modified) response to tell the cache which entry is  
appropriate.

   If the entity-tag of the new response matches that of an existing
   entry, the new response SHOULD be used to update the header  
fields of

   the existing entry, and the result MUST be returned to the client.

In other words, the conditional request containing all of the entity
tags satisfies the semantics of Vary when the server responds with
304 and one of those entity tags.

And, no, mod_cache doesn't deal with it -- it just isn't a
very efficient cache.


The compromise I'd be willing to accept is to have mod_deflate support
the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding
bit - and to prefer that over any Accept-Encoding bits that are sent.
The ETag can clearly remain the same in that case - even as a strong
ETag.  So, Squid can change to send along TE: gzip (if it isn't
already).  And, everyone else who sends Accept-Encoding gets the
result in a way that doesn't pooch their cache if they try to do a
later conditional request.

Is that acceptable?  -- justin


The best solution is to not mess with content-encoding at all, which
gets us out of both this consistency problem and related problems
with the entity-header fields (content-md5, signatures, etc.).
That is why transfer encoding was invented in the first place.

We should have an 

Re: Wrong etag sent with mod_deflate

2006-12-07 Thread Henrik Nordstrom
tor 2006-12-07 klockan 02:31 +0100 skrev Justin Erenkrantz:

 mod_deflate should just add the W/ prefix if it's not already there.  -- 
 justin

No, that won't work. You still be just as non-conforming by doing that.
But if mod_deflate may to produce different octet-level results on
different requests for the same original entity then it must do this in
addition to other transforms of the ETag.

The identity and gzip encodings is not bidirectionally semantically
equivalent, and additionally normal conditional comparing W/X to X
is true.

See RFC 2616 3.3.3 Weak and Strong Validators

You must make the value of the ETag differ between the two entities.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-07 Thread Henrik Nordstrom
tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz:

 -1 on adding semantic junk to the existing ETag (and keeping it
 strong); that's blatantly uncool.  Any generated ETag from mod_deflate
 should either be the original strong version or a weak version of any
 previous etag.  mod_deflate by *definition* is just creating a weak
 version of the prior entity.

You basically only have two choices:

a) Make mod_deflate not send an ETag on modified responses.

b) Modify the value (within the quotes) of the ETag somehow. And if
mod_deflate can not be trusted to always return the same octet
representation make sure to use an weak ETag unless the ETag generation
is also tightly coupled to the octet representation guaranteing a
different ETag should mod_deflate encode slightly different.

And to be fully compliant you also need to pay attention to the
Content-Location header. Here I don't see much choice but to not send
Content-Location in mod_deflate mangled responses (but can be kept on
the original response, no problem there).

RFC 2616 13.6 Caching Negotiated Responses, last paragraph.

 mod_deflate does properly stick in the Vary header, so caches already
 have enough knowledge to know what's going on anyway even without a
 fix.  (This is probably why mod_cache doesn't flag it as an error.)

 My opinion is to fix the protocol and move on...  -- justin

The protocol is quite fine as it is, and not easy to change. As it is
now it's mainly a matter of understanding that mod_deflate does create a
completely new entity from the original one. To the protocol it's
exactly the same as when using mod_negotiate and having both the
identity and gzip encoded entities on disk. The fact that you do this
encoding on the fly is of no concern to HTTP.

Another option is to explore the use gzip transfer encoding instead of
content encodin. In transfer encoding none of these problems apply as
it's done on the transport level and not entity level, but it's not that
well supported in clients unfortunately..

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Wrong etag sent with mod_deflate

2006-12-06 Thread Chris Elving

Roy T. Fielding wrote:


Protocol issues really should be brought up on the dev list, with an
appropriate subject, and not left in bugzilla.


FWIW, there was a dev list thread on this 3 years ago with the subject 
mod_deflate and transfer / content encoding problem.


http://www.mail-archive.com/dev@httpd.apache.org/msg18366.html


Re: Wrong etag sent with mod_deflate

2006-12-06 Thread Justin Erenkrantz

On 12/7/06, Roy T. Fielding [EMAIL PROTECTED] wrote:

Entities gzip:ed by mod_deflate still carries the same ETag as the
plain entiy,
causing inconsistency in ETag aware proxy caches.

I'll have a look later and see if I can fix it, but let me know if there
is already a patch in the works (that doesn't rely on mod_filter).


mod_deflate should just add the W/ prefix if it's not already there.  -- justin


Re: Wrong etag sent with mod_deflate

2006-12-06 Thread Justin Erenkrantz

On 12/7/06, Justin Erenkrantz [EMAIL PROTECTED] wrote:

mod_deflate should just add the W/ prefix if it's not already there.  -- justin


But, that'll break caches as we're not allowed to serve If-Match with
weak entity tags.  Feh.

-1 on adding semantic junk to the existing ETag (and keeping it
strong); that's blatantly uncool.  Any generated ETag from mod_deflate
should either be the original strong version or a weak version of any
previous etag.  mod_deflate by *definition* is just creating a weak
version of the prior entity.

mod_deflate does properly stick in the Vary header, so caches already
have enough knowledge to know what's going on anyway even without a
fix.  (This is probably why mod_cache doesn't flag it as an error.)

My opinion is to fix the protocol and move on...  -- justin