URL: <https://savannah.gnu.org/bugs/?67488>
Summary: Consider saving the URL that was fetched Group: GNU Wget Submitter: eokoochu Submitted: Tue 09 Sep 2025 03:33:36 PM GMT Category: Feature Request Severity: 3 - Normal Priority: 5 - Normal Status: None Privacy: Public Assigned to: None Originator Name: Originator Email: Open/Closed: Open Discussion Lock: Any Release: None Operating System: GNU/Linux Reproducibility: None Fixed Release: None Planned Release: None Regression: None Work Required: None Patch Included: None _______________________________________________________ Follow-up Comments: ------------------------------------------------------- Date: Tue 09 Sep 2025 03:33:36 PM GMT By: Eo Koochu <eokoochu> When archiving a webpage, this command is quite useful: $ wget -P "$dir" -E -H -k -K -p "$url" The annoying thing is that it leaves no record of what URL was fetched. Not only would it be useful to store that information, but it’s somewhat important to remedy another problem: all the content is scattered into a tree of files. Which file do we need to tell the browser to open later? I have written a wrapper script for wget that writes a file “url.txt” which then contains the URL that was fetched. It’s very useful for later working out which file in the tree the browser needs to open. It’s a hack though. Ideally wget should store the URL in a way that solves both problems, so we have metadata of what was fetched and therefore what to open with a browser. And since webpages often change, it might be useful to record the date of the snapshot somewhere too. For reference, there is a Firefox plugin called SingleFile that saves a webpage and all objects to render it in a single file. When it does that, it adds a comment to the top of the HTML file that contains the URL. E.g.: <!DOCTYPE html> <html lang=en data-color-mode=auto data-light-theme=light data-dark-theme=dark data-a11y-animated-images=system data-a11y-link-underlines=true class=js-focus-visible data-js-focus-visible data-turbo-loaded style><!-- Page saved with SingleFile url: https://savannah.gnu.org/bugs/?group=wget saved date: Tue Sep 09 2025 17:21:47 GMT+0200 (Central European Summer Time) --><meta charset=utf-8> _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?67488> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature