Re: Content-encoding, gzip, and all that stuff

James Barwick Sat, 09 Jun 2001 15:56:00 -0700
Having read everyone's messages, I'd like to ask the $10K question.....

I found the Accept-Encoding section in the Mozilla preferences...and 
cleard them....however, .gz's still come down uncompressed.  How can we 
prevent that?  Every file I get now I have to recompress it.  Almost to
the point of writing a script and calling it in cron to run every 10 
minutes to look my download directory(ies) for .tar's and gzip them.

Now a further question.  If receiving a .gz file and Mozilla 
automatically uncompresses it...and the download save box says 
application/x-tar....why does it set the filename to x.tar.gz... 
shouldn't it remove the gz for you if it's going to uncompress it???

This is very troubling and a big pain in the #$*%.



Dean Gaudet wrote:

> 
> On Wed, 6 May 1998, Eric Bina wrote:
> 
> 
>>Well, I want to enforce a particular interpretation of
>>Accept-Encoding., and you want to enforce a particular 
>>interpretation of Transfer-Encoding.  My reading of the
>>spec still implies to me that Transfer-Encoding is not
>>what I am talking about here because the serve is serving
>>an existing file, not transforming a message at the
>>time of transfer.  However, I am very willing to to use
>>Transfer-Encoding to reach the same ends.  Perhaps we
>>should use both.
>>
> 
> While the server may have it already compressed, you can consider that an
> optimization for server performance.  With a small modification, folks
> could have both gzipped and ungzipped versions of everything, and apache
> could serve appropriately based on the TE header.  It could also serve
> appropriately based on the Accept-Encoding header.  In both cases though,
> apache would have to negotiate -- which means the URL *shouldn't* include
> any of the file extensions.
> 
> In the Accept-Encoding case you wouldn't know if the file is gzipped on
> the server for bandwidth reasons, or if it's supposed to be compressed. 
> With TE you do know -- the server is telling you it's compressed for
> transfer and should be decompressed for "presentation" to the user (be
> that on screen or on disk). 
> 
> [Yes there are obvious optimizations/features to be applied here; keep
> caches of compressed files, maintain that all automatically, and so forth. 
> But that's not likely in the apache 1.x timeframe... but *is* on a few of
> our lists for apache 2.0.]
> 
> In any event, if you refer to /foo.tar.gz, then apache will serve it as
> is, no negotiation at all.  It's when you refer to /foo.tar or /foo that
> Apache starts to do negotiation.  So folks designing the download pages
> for an application file, suppose foo.xls (an excel spreadsheet), could
> place these files on disk (this is with the hypothetical easy mods
> named above):
> 
>     foo.xls.identity
>     foo.xls.gz
> 
> and then refer to /foo.xls.  Apache would negotiate between the two.  For
> best effect we would make it serve foo.xls.identity when no Accept-Encoding,
> or TE header is supplied.  We'd serve foo.xls.gz if the browser said they
> "Accept-Encoding: gzip", or "TE: gzip".  The big difference between these
> cases is that in the latter case the browser knows for certain that the
> compression is for transfer reasons only.
> 
> These are all details of the implementation within Apache.  I'd rather we
> talk more general about how an arbitrary server reacts to these queries...
> and I really think TE is the way to go (partially because that's what Roy
> Fielding said to me when I asked him about it... and I tend to trust Roy's
> judgement on HTTP interoperability issues ;) 
> 
> 
>>>Suppose the url /foo.gz maps to a file which exists, and the browser sends
>>>along "TE: gzip" *and no Accept-Encoding header*.  In this case Apache
>>>would respond like this, because the file exists as named it does *no
>>>negotiation*:
>>>
>>>    Content-Type: whatever
>>>    Content-Encoding: gzip
>>>    Content-Length: 1234
>>>
>>>As you suggest, Apache could be changed to serve it as:
>>>
>>>    Content-Type: application/gzip
>>>    Content-Length: 1234
>>>
>>>I believe there is a reason for the former choice, and the reason has to
>>>do with legacy support for older versions of various clients (including
>>>navigator), *but I haven't researched the reason*.  I don't think it's
>>>important, we can change it in a development version and see what
>>>breaks (and document the breakage this time around).
>>>
>>>Suppose the url "/foo" maps to a file which doesn't exist, but /foo.gz
>>>still exists, in this case Apache will negotiate, right now it doesn't
>>>understand TE, so it responds:
>>>
>>>    Content-Type: whatever
>>>    Content-Encoding: gzip
>>>    Content-Length: 1234
>>>
>>>The client could use heuristics to guess this shouldn't be stored on
>>>disk compressed because the url doesn't contain .gz.
>>>
>>>But once Apache is changed to understand TE it could respond:
>>>
>>>    Content-Type: whatever
>>>    Transfer-Encoding: gzip, chunked
>>>
>>>Because the TE header provides it the chance to negotiate a compressed
>>>transfer of /foo to the client, and the client knows that it was
>>>compressed only for transfer.  Note that there's no Content-Length in
>>>this case -- I'm pretty sure it would be wrong to send 1234 as the
>>>C-L here because the C-L is supposed to be the length of the message
>>>after removing the transfer-encodings... and the uncompressed number
>>>may not be easily available (perhaps it is in the gzip header, dunno).
>>>
>>>So there you have it.  If your supposed application wants things
>>>compressed then refer to the file exactly, and it'll be served exactly.
>>>This is a link-design issue when building the html pages on the site.
>>>
>>This is a perfect example of where we should use both TE and
>>Accept-Encoding.  Use the same example but have the client send
>>Accept-Encoding: identity;q=1.0, *;q=0
>>
>>For your fist example of /foo.gz it would have to return
>>Content-Type: application/gzip which is exactly what I want,
>>and for for old browsers not sending the Accept-Encoding it would
>>return Content-Encoding: gzip which is what they want.
>>The second example of /foo would I believe stay the same since
>>the Accept-Encoding header should not effect the result which
>>is being sent with no Content-Encoding.
>>
> 
> But with application/gzip you have lost critical information -- you
> have lost the content-type of the base file.  Sure you can infer it
> heuristically based on extensions, but... why?  If you send TE
> instead of Accept-Encoding you get the base type and the knowledge
> that you can strip the compression.  With "Accept-Encoding: gzip"
> you get the base type, but you still don't know if you can strip
> the compression.
> 
> Dean
> 
> 
>
Re: Content-encoding, gzip, and all that stuff

Reply via email to