[jira] [Comment Edited] (SPARK-22851) Download mirror for spark-2.2.1-bin-hadoop2.7.tgz has file with incorrect checksum

John Brock (JIRA) Thu, 21 Dec 2017 16:00:27 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300740#comment-16300740
 ]


John Brock edited comment on SPARK-22851 at 12/21/17 11:59 PM:
---------------------------------------------------------------

I think the inconsistent behavior in Chrome is due to different headers being 
sent back from the mirrors:

{code:none}
> curl -I 
> http://apache.mirrors.pair.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
HTTP/1.1 200 OK
Date: Thu, 21 Dec 2017 23:46:58 GMT
Server: Apache/2.2.29
Last-Modified: Sat, 25 Nov 2017 02:44:26 GMT
ETag: "32b662-bfa03c4-55ec5a5c358a1"
Accept-Ranges: bytes
Content-Length: 200934340
Content-Type: application/x-tar
Content-Encoding: x-gzip

> curl -I 
> http://apache.cs.utah.edu/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
HTTP/1.1 200 OK
Date: Thu, 21 Dec 2017 23:47:19 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Sat, 25 Nov 2017 02:44:26 GMT
ETag: "2ae630-bfa03c4-55ec5a5c0d680"
Accept-Ranges: bytes
Content-Length: 200934340
Content-Type: application/x-gzip
{code}


Note that for the first mirror above, {{Content-Type}} is 
{{application/x-tar}}, and {{Content-Encoding}} is {{x-gzip}}. For the second 
mirror above, {{Content-Type}} is {{applicaton/x-gzip}} and there is no 
{{Content-Encoding}} value.

For Safari, both sites give me a tar, so Safari may use some other method than 
looking at the header to determine whether a file is a gzip tarball.

EDIT: See the top answer at 
https://superuser.com/questions/940605/chromium-prevent-unpacking-tar-gz, it 
seems like the "bug" is that the first mirror above sends back a 
{{Content-Encoding}} value of {{x-gzip}}.

{quote}Your web server is likely sending the .tar.gz file with a 
content-encoding: gzip header, causing the web browser to assume a gzip layer 
was applied only to save bandwidth, and what you really intended to send was 
the .tar archive. Chrome un-gzips it on the other side like it would with any 
other file (.html, .js, .css, etc.) that it receives gzipped (it dutifully 
doesn't modify the filename though).

To fix this, make sure your web server serves .tar.gz files without the 
content-encoding: gzip header.

More Info: https://code.google.com/p/chromium/issues/detail?id=83292{quote}


was (Author: jbrock):
I think the inconsistent behavior in Chrome is due to different headers being 
sent back from the mirrors:

{code:sh}
> curl -I 
> http://apache.mirrors.pair.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
HTTP/1.1 200 OK
Date: Thu, 21 Dec 2017 23:46:58 GMT
Server: Apache/2.2.29
Last-Modified: Sat, 25 Nov 2017 02:44:26 GMT
ETag: "32b662-bfa03c4-55ec5a5c358a1"
Accept-Ranges: bytes
Content-Length: 200934340
Content-Type: application/x-tar
Content-Encoding: x-gzip

> curl -I 
> http://apache.cs.utah.edu/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
HTTP/1.1 200 OK
Date: Thu, 21 Dec 2017 23:47:19 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Sat, 25 Nov 2017 02:44:26 GMT
ETag: "2ae630-bfa03c4-55ec5a5c0d680"
Accept-Ranges: bytes
Content-Length: 200934340
Content-Type: application/x-gzip
{code}


Note that for the first mirror above, {{Content-Type}} is 
{{application/x-tar}}, and {{Content-Encoding}} is {{x-gzip}}. For the second 
mirror above, {{Content-Type}} is {{applicaton/x-gzip}} and there is no 
{{Content-Encoding}} value.

For Safari, both sites give me a tar, so Safari may use some other method than 
looking at the header to determine whether a file is a gzip tarball.

EDIT: See the top answer at 
https://superuser.com/questions/940605/chromium-prevent-unpacking-tar-gz, it 
seems like the "bug" is that the first mirror above sends back a 
{{Content-Encoding}} value of {{x-gzip}}.

{quote}Your web server is likely sending the .tar.gz file with a 
content-encoding: gzip header, causing the web browser to assume a gzip layer 
was applied only to save bandwidth, and what you really intended to send was 
the .tar archive. Chrome un-gzips it on the other side like it would with any 
other file (.html, .js, .css, etc.) that it receives gzipped (it dutifully 
doesn't modify the filename though).

To fix this, make sure your web server serves .tar.gz files without the 
content-encoding: gzip header.

More Info: https://code.google.com/p/chromium/issues/detail?id=83292{quote}

> Download mirror for spark-2.2.1-bin-hadoop2.7.tgz has file with incorrect 
> checksum
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-22851
>                 URL: https://issues.apache.org/jira/browse/SPARK-22851
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.1
>            Reporter: John Brock
>            Priority: Critical
>
> The correct sha512 is:
> 349ee4bc95c760259c1c28aaae0d9db4146115b03d710fe57685e0d18c9f9538d0b90d9c28f4031ed45f69def5bd217a5bf77fd50f685d93eb207445787f2685.
> However, the file I downloaded from 
> http://apache.mirrors.pair.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
>  is giving me a different sha256:
> 039935ef9c4813eca15b29e7ddf91706844a52287999e8c5780f4361b736eb454110825224ae1b58cac9d686785ae0944a1c29e0b345532762752abab9b2cba9
> It looks like this mirror has a file that isn't actually gzipped, just 
> tarred. If I ungzip one of the copies of spark-2.2.1-bin-hadoop2.7.tgz with 
> the correct sha512, and take the sha512 of the resulting tar, I get the same 
> incorrect hash above of 
> 039935ef9c4813eca15b29e7ddf91706844a52287999e8c5780f4361b736eb454110825224ae1b58cac9d686785ae0944a1c29e0b345532762752abab9b2cba9.
> I asked some colleagues to download the incorrect file themselves to check 
> the hash -- some of them got a file that was gzipped and some didn't. I'm 
> assuming there's some caching or mirroring happening that may give you a 
> different file than the one I got.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-22851) Download mirror for spark-2.2.1-bin-hadoop2.7.tgz has file with incorrect checksum

Reply via email to