URL: <http://savannah.gnu.org/bugs/?50320>
Summary: Bad link conversion with mixed HTTP/HTTPS content plus --mirror --adjust-extension Project: GNU Wget Submitted by: None Submitted on: Wed 15 Feb 2017 06:08:54 PM UTC Category: Program Logic Severity: 3 - Normal Priority: 5 - Normal Status: None Privacy: Public Assigned to: None Originator Name: Thomas Claveirole Originator Email: thomas.claveirole@green-communications.f Open/Closed: Open Discussion Lock: Any Release: trunk Operating System: GNU/Linux Reproducibility: Every Time Fixed Release: None Planned Release: None Regression: None Work Required: None Patch Included: None _______________________________________________________ Details: Hello, When I setup a local web server to provide : <!DOCTYPE html> <html> <head> <title>Wget test</title> </head> <body> <script src="http://localhost/wget-test/script.js?foo=bar"></script> <script src="https://localhost/wget-test/script.js?foo=bar"></script> </body> </html> when requesting /wget-test/, either as HTTP or HTTPS, as well as a /wget-test/script.js resource (regardless of the scheme and query string; the content of this file is irrelevant). Then, wget --mirror --adjust-extension --convert-links http://localhost/wget-test/ rewrites the script links as follows: <!DOCTYPE html> <html> <head> <title>Wget test</title> </head> <body> <script src="script.js%3Ffoo=bar"></script> <script src="script.js%3Ffoo=bar.html"></script> </body> </html> Note that the second link has an incorrect .html suffix appended. On the filesystem, the downloaded file does not have this suffix, so the link is broken. I guess the correct behavior should be not to append the .html suffix, but I am unsure whether two URLs that differ only in scheme (http:// vs. https://) should be considered the same resource and rewritten to point to the same location. (This test case was derived from trying to mirror a much bigger site and it took me some time to pinpoint the issue. The bug also arises when multiple pages from the website link to the same resource using mixed http and https schemes -- which is a more realistic scenario.) Looking at the bug tracker, I get the feeling that this bug might be related to #50173 and #25340, but this is unclear to me. Find attached a debug log for : wget -o wget.log --debug --no-check-certificate --mirror --adjust-extension --convert-links http://localhost/wget-test/ with my setup. Regards, Thomas Claveirole _______________________________________________________ File Attachments: ------------------------------------------------------- Date: Wed 15 Feb 2017 06:08:54 PM UTC Name: wget.log Size: 9kB By: None <http://savannah.gnu.org/bugs/download.php?file_id=39762> _______________________________________________________ Reply to this item at: <http://savannah.gnu.org/bugs/?50320> _______________________________________________ Message sent via/by Savannah http://savannah.gnu.org/