URL: <https://savannah.gnu.org/bugs/?62869>
Summary: if retry hits a 302 FOUND wget forgets to send the Range header thus appending the whole file to what's downloaded alrdy Project: GNU Wget Submitter: correabuscar Submitted: Sat 06 Aug 2022 05:49:12 AM UTC Category: Program Logic Severity: 3 - Normal Priority: 5 - Normal Status: None Privacy: Public Assigned to: None Originator Name: Originator Email: Open/Closed: Open Release: trunk Discussion Lock: Any Operating System: GNU/Linux Reproducibility: Every Time Fixed Release: None Planned Release: None Regression: None Work Required: None Patch Included: Yes _______________________________________________________ Follow-up Comments: ------------------------------------------------------- Date: Sat 06 Aug 2022 05:49:12 AM UTC By: Emanuel Czirai <correabuscar> Hello. I've encountered this append bug on Gentoo with wget-1.21.3-r1 while portage is downloading the file android-studio-2022.1.1.9-linux.tar.gz for Android Studio Canary (a 1G file, which on disk was 1.6G and thus corrupt due to this bug) I've (not yet) attached file *problem_on_real_url.log* if you want to see wget output the second time I've reproduced the above which yielded a file that was 24 MiB larger. I haven't redacted anything(like my IP address). I haven't attached this yet, because only 4 files can be attached, if you really want to see this let me know, I will attach in the next comment, but only if you need to see it. I couldn't reproduce it all the time because those google servers don't always yield a 302 FOUND after a timeout and they don't always timeout either. So I've come up with a test that always reproduces this issue (unfortunately, I couldn't figure out how to make it a test case - test suite doesn't seem to have the needed functionality): A server that pretends to timeout in the middle of the transfer then when wget retries, it will give a 302 FOUND <https://www.rfc-editor.org/rfc/rfc7231.html#section-6.4.3> and redirect to another server and this is when wget forgets to send the Range header which specifies from where should the server continue sending the file, thus the server sends the full file from the beginning, and wget still acts as if the file is being sent from the continue point, thus appending the full file to whatever it already downloaded until the timeout(and the 302) occurred. I've attached files: a.py go tst wget_no_append_on302_uponretry.patch to run the test and check that the bug exists just first *chmod a+x go tst* then run(as normal user, always): ./go or to see wget --debug output: ./go --debug or ./go bug --debug The last line should be a red color: "Bug still present!" To see how wget acts when the server doesn't do a 302 redirect after a timeout (ie. it never hits this bug) then run: ./go nobug --debug This will always say as last line: "Bug is fixed." To test both: ./tst For this test script, if the bug is not fixed you get a yellow/brown last line: "ok, bug test is fine ie. wget isn't fixed (but it should eventually be, hence why this is yellow)" but if the bug is fixed, you get: "Failed to reveal the bug, was the wget bug fixed?! (assume this is green if you know that wget got fixed)" The test wants to wget the file with contents "Hello World.\r\n" but the server induces a timeout after "Hello " and this causes wget to retry, but the server then gives a 302 which wget follows and then wget doesn't send a Range header anymore causing the server to reply with 200 OK instead of 206 Partial Content, thus the final file contents are "Hello Hello World.\r\n" when the bug is present, thus showcasing the fact that the whole file(which is "Hello World.\r\n") just got appended to whatever it already downloaded(which is the first "Hello ") Apply that attached patch to wget to see a proof of concept hacky fix which makes wget do send a Range header after the 302 happens by pretending that wget was ran with --start-pos=X arg, where X is the file offset it should've continued from. It's a hack, not the actual fix. _______________________________________________________ File Attachments: ------------------------------------------------------- Date: Sat 06 Aug 2022 05:49:12 AM UTC Name: wget_no_append_on302_uponretry.patch Size: 1KiB By: correabuscar test for the bug presence and hacky poc patch <http://savannah.gnu.org/bugs/download.php?file_id=53534> ------------------------------------------------------- Date: Sat 06 Aug 2022 05:49:12 AM UTC Name: go Size: 993B By: correabuscar test for the bug presence and hacky poc patch <http://savannah.gnu.org/bugs/download.php?file_id=53533> ------------------------------------------------------- Date: Sat 06 Aug 2022 05:49:12 AM UTC Name: tst Size: 2KiB By: correabuscar test for the bug presence and hacky poc patch <http://savannah.gnu.org/bugs/download.php?file_id=53532> ------------------------------------------------------- Date: Sat 06 Aug 2022 05:49:12 AM UTC Name: a.py Size: 8KiB By: correabuscar test for the bug presence and hacky poc patch <http://savannah.gnu.org/bugs/download.php?file_id=53531> _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?62869> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/