Re: Wrong etag sent with mod_deflate

TOKILEY Sat, 09 Dec 2006 02:44:44 -0800

Let me preface all comments by saying that I AGREE with BOTH
Roy and Henrik... If Apache is sending the same exact (strong)
ETag value for both a compressed and an identity variant of
the same entity... then, according to current RFC content, 
that is broken behavior and it should be fixed.


You can take the part of the RFC that talks specifically about
how "Weak" Etags might seem "ideal" for compressed variants and
argue that against Henrik's point of view that a compressed
variant should ALWAYS be treated as a separate (unique) HTTP
entity but I don't want to go there. Not now, anyway.

Personally I tend to agree with the concept that even if DCE is 
employed ( Dynamic Content Encoding ) that any code that is
doing DCE ( versus Transfer Encoding ) should make that 
dynamically generated entity "appear" as if it was simply
a disk-based (separate) resource. 

DCE is, after all, just a "magic trick". It is making it
APPEAR to end-users as if compressed variants of entities
actually physically exist and are being sent back to 
anyone ready/able/willing to receive them...

...and it's a GOOD TRICK, when done correctly.

>> Roy wrote
>>
>> In other words, Henrik has it right.  It is our responsibility to
>> assign different etags to different variants because doing otherwise
>> may result in errors on shared caches that use the etag as a variant
>> identifier.

See above. Totally agree.

>> Justin wrote...
>>
>> As Kevin mentioned, Squid is only using the ETag and is ignoring the
>> Vary header. That's the crux of the broken behavior on their part.
>
> Roy wrote...
>
> Then they will still be broken regardless of what we do here.  It simply
> isn't a relevant issue.

It's relevant to the extent that I think there are still some things
missing from the RFCs with regards to all this which is why a piece
of software like SQUID might be "doing the wrong thing" as well.

Best way I could "elaborate" on that feeling is to just walk
through Roy's scenario...

> Roy wrote...
>
> Unlike Squid, RFC compliance is part of our mission, at least when
> it isn't due to a bug in the spec.  This is not a bug in the spec.
>
> A high-efficiency response cache is expected to have multiple
> representations of a given resource cached.  

No doubt.

> The cache key is the URI.  

Yes.

> If the set of varying header field values that 
> generated the cached response is different from the request set, 

...as when one browser asks for the a URI and sends 
"Accept-encoding: gzip" and another ask for the same URI
and does NOT supply "Accept-encoding: gzip"...

> then a conditional GET request is made containing ALL of the 
> cached entity tags in an If-None-Match field 
> (in accordance with the Vary requirements). 

...and, currently, if the cache has stored both a compressed and
and non-compressed version of the same entity received from Apache
( sic: mod_deflate ) then the same ( strong ) ETag is returned
in the conditional GET for both of the cached variants.

Hmmm... begins to look like a problem... but is it really?... 

> If the server says that any one of the representations,
> as indicated by the ETag in a 304 response, is okay, 

"okay" means "fresh".

In the case of a DCE encoded variant, an argument could be made
here that it doesn't make a bit of difference if the ETag for
the compressed or non-compressed variant is the 'same' or it is
'different'. All the cache really wants to know is "Is the
ORIGINAL ( uncompressed ) version of this response fresh
or not?"

The compressed variant should ALWAYS be just the encoded version of 
the same "original" uncompressed entity. If the original "uncompressed"
version ( indicated by strong ETag 1 ) is not "fresh" then there is
no possible way for any "compressed" variant of the same entity
( marked by the same strong ETag 1 ) to be "fresh". It's just
not possible.

So, in essence, when the "Vary:" has to do with just "compression",
then the compressed and uncompressed "variants" are "married" in
a way that, perhaps, is not covered in the existing ETag RFC
specifications. The ETag CAN/SHOULD be the "same" because there
is no way for the original ( strong ETag ) to become "not fresh" 
without the other representation also becoming "not fresh".

These kinds of "Variants" are "Synced" in a way perhaps not
( currently ) covered by the ETag specs.

> then the cached representation with that entity tag is sent to 
> the user-agent regardless of the Vary calculation.

"sent to" means the cache has received it's 304 response and 
decided what it CAN/SHOULD send back to the user, right?

Well... if you follow the argument above about how certain
variants are "synced" together then even if two variants on
the cache share the same strong "ETag"... then how can the
cache send back the "wrong thing" or NOT pay attention to
the "Vary calculation" on its end?

I don't know the exact details of the exact "field problem"
that Henrik is trying to solve but it seems to me that EVEN
THOUGH the "compressed" and "non-compressed" variants might
happen to share the same (strong) ETag... if SQUID is delivering
stale compressed variants when a 304 response says that the
original "identity" variant is "not fresh" then that's just
a colossal screw-up in the caching code itself.

It's the "common sense" element I was talking about.

Regardless of what the server says... how could you ever get
into a situation where you would consider a compressed variant
of an entity "fresh" when the "identity" version is now "stale"? 

> In short, if we have two active
> representations that have the same etag, then we have violated the
> spec and created an unnecessary interoperability problem

Maybe so. Maybe the two variants should ALWAYS have different
ETags... and ( see start of message ) I AGREE that they should...
but if you also follow the "walk-through" above then something
is seriously confused even if the ETags are the same and the
cache is sending back "stale" compressed variants when the
"identity" variant ( strong ETag value ) is also "stale". 
There's still something missing from the specs or something.

When an exact, literal interpretation of a spec tends to 
defy common sense... my instinct is to suspect the spec itself.

>> Justin wrote...
>>
>> The compromise I'd be willing to accept is to have mod_deflate support
>> the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding
>> bit - and to prefer that over any Accept-Encoding bits that are sent.
>> The ETag can clearly remain the same in that case - even as a strong
>> ETag.  So, Squid can change to send along TE: gzip (if it isn't
>> already).  And, everyone else who sends Accept-Encoding gets the
>> result in a way that doesn't pooch their cache if they try to do a
>> later conditional request.
>
> Is that acceptable?  -- justin
>
> Roy wrote...
>
> The best solution is to not mess with content-encoding at all, which
> gets us out of both this consistency problem and related problems
> with the entity-header fields (content-md5, signatures, etc.).
> That is why transfer encoding was invented in the first place.

True... but that is actually just a huge cop-out.

DCE ( Dynamic Content Encoding ) is a valid concept even if it
wasn't sufficiently "imagined" at the time the specs were
codified. It works. It works WELL... and it is something that
OUGHT to always be possible if the RFCs mean anything at all.

As I said in an earlier post... there are only a few lines of
code still missing from Server / Cache / Client software to make
it all work flawlessly. The remaining "gotchas" all have to do
with the "caching" aspects which, I hope, this discussion might
finally resolve and bring about the final actions on everyone's
part to make DCE ( finally ) work as it should.

One of the main "prime directives" for developing Apache 2.0
at all was to finally re-org the IO stream so that schemes
like DCE could be done more easily than were already being
done in the 1.3.x framework. Mission was accomplished.
Filtering was born. It would be a shame to consider abandoning
one of the very concepts that gave birth to Apache 2.0 for 
the sake of a few more lines of code that could take it
into the "end zone".

> Roy wrote...
>
> We should have an implementation of deflate as a transfer encoding,
> but it should be configurable independent of the existing filter.
> Some people will want TE specifically to avoid the addition of Vary
> and all the other problems that content-changing filters cause.
> For example, an additional directive option for CE, TE, or "either".

No argument here. Transfer-encoding is about a DECADE overdue now.

> The existing filter needs to modify the ETag field value (and
> any other entity-dependent values that we can think of) 

In the case of compressed entities it would still be a good idea
to always add a standard header which indicates the original
uncompressed content-length ( if it's possible to know it ).

> or be removed as a feature.  Weak etags are not a solution -- being able
> to make range requests of large cached representations requires a
> strong etag, and it really isn't hard to provide one.  It is better
> to not deflate the response at all than to interfere with caching.

If Transfer-encoding ever becomes a reality you will see the need
for DCE decrease. It is actually the CACHES themselves that need
TE capability more than the Server/Cache sub-links. 

More often than not... it is the CACHES that are handling the
"last mile", which is where compression makes the biggest difference.

Yours...
Kevin Kiley

Re: Wrong etag sent with mod_deflate

Reply via email to