On Fri, 3 Nov 2017, Tim Rühsen wrote:
On 11/03/2017 06:37 AM, James Cloos wrote:
"TR" == Tim Rühsen <[email protected]> writes:
TR> I downloaded/tested thousands of web pages and they behave as if 'Content-
TR> Encoding: gzip' is a compression for the transport. Uncompressing it
'on-the-
TR> fly' and saving that uncompressed data was the correct behavior.
Lots of servers have that misconfiguration; it was recommended in the
past and apache defaulted to doing that when grabbing things like tar.gz.
The gui browsers had to learn to work around that misconfig. wget also
has to.
In short, do not uncompress if the destination name has a compression
suffix.
Or, in that case, test whether the uncompressed data starts with gzip
magic and complete one decompression if so, non if not so.
And the same for the other compression formats.
Thanks for this insight !
Looking at the Mozilla/Gecko sources shows that gzip Content-Encoding is
just cleared for Content-Types application/x-gzip, application/gzip and
application/x-gunzip. That makes it straight forward to go that way.
That seems at least for the gzip ones to be a client-side correction of an
incorrect server behaviour according to RFC 7231 "Hypertext Transfer
Protocol (HTTP/1.1): Semantics and Content"
https://tools.ietf.org/html/rfc7231#section-3.1.2.2
If the media type includes an inherent encoding, such as a data
format that is always compressed, then that encoding would not be
restated in Content-Encoding even if it happens to be the same
algorithm as one of the content codings. Such a content coding would
only be listed if, for some bizarre reason, it is applied a second
time to form the representation. Likewise, an origin server might
choose to publish the same data as multiple representations that
differ only in whether the coding is defined as part of Content-Type
or Content-Encoding, since some user agents will behave differently
in their handling of each response (e.g., open a "Save as ..." dialog
instead of automatic decompression and rendering of content).
Regards
Jens