URL:
  <https://savannah.gnu.org/bugs/?67488>

                 Summary: Consider saving the URL that was fetched
                   Group: GNU Wget
               Submitter: eokoochu
               Submitted: Tue 09 Sep 2025 03:33:36 PM GMT
                Category: Feature Request
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name:
        Originator Email:
             Open/Closed: Open
         Discussion Lock: Any
                 Release: None
        Operating System: GNU/Linux
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Tue 09 Sep 2025 03:33:36 PM GMT By: Eo Koochu <eokoochu>
When archiving a webpage, this command is quite useful:

$ wget -P "$dir" -E -H -k -K -p "$url"

The annoying thing is that it leaves no record of what URL was fetched. Not
only would it be useful to store that information, but it’s somewhat
important to remedy another problem: all the content is scattered into a tree
of files. Which file do we need to tell the browser to open later?

I have written a wrapper script for wget that writes a file “url.txt”
which then contains the URL that was fetched. It’s very useful for later
working out which file in the tree the browser needs to open. It’s a hack
though. Ideally wget should store the URL in a way that solves both problems,
so we have metadata of what was fetched and therefore what to open with a
browser. And since webpages often change, it might be useful to record the
date of the snapshot somewhere too.

For reference, there is a Firefox plugin called SingleFile that saves a
webpage and all objects to render it in a single file. When it does that, it
adds a comment to the top of the HTML file that contains the URL. E.g.:

<!DOCTYPE html> <html lang=en data-color-mode=auto data-light-theme=light
data-dark-theme=dark data-a11y-animated-images=system
data-a11y-link-underlines=true class=js-focus-visible data-js-focus-visible
data-turbo-loaded style><!--
 Page saved with SingleFile 
 url:  https://savannah.gnu.org/bugs/?group=wget
 saved date: Tue Sep 09 2025 17:21:47 GMT+0200 (Central European Summer Time)
--><meta charset=utf-8>








    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?67488>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to