Hi Paul Thank you very much indeed for your very informative and helpful reply and for the link to your MakeStaticSite tool. I will try it out.
Kind regards Tim ----- Original Message ----- From: Paul Trafford To: timsc...@timscrim.co.uk ; bug-wget@gnu.org Sent: Thursday, March 13, 2025 10:22 AM Subject: Re: Problem downloading a website from archive.org Hello Tim, Websites on archive.org or, more specifically, web.archive.org, are, as you've observed, stored piecemeal as snapshots. When browsing, the Wayback Machine stitches the snapshots together. The problem of retrieval for the likes of Wget is explained by Archive Team https://wiki.archiveteam.org/index.php?title=Restoring As an attempted solution, I have developed a prototype tool, MakeStaticSite that runs Wget iteratively, downloading snapshots selectively to minimise repetition, then merging them into a canonical form. https://makestaticsite.sh/ https://github.com/paultraf/makestaticsite Otherwise, there are various approaches, APIs and tools. See e.g., https://archive.org/help/wayback_api.php Regards, Paul Paul Trafford Oxford, UK On 12/03/2025 22:57, timsc...@timscrim.co.uk wrote: Hi Everyone I am trying to download a complete website from archive.org using Wget but I have run into a problem. If you are a human and you are exploring an old website on archive.org, you may notice that sometimes when you click on a link from one page on the website to another, the datestamp part of the URL changes. You can also end up on the same page as you were previously but with a different datestamp. This is not much of a problem if you are a human but it a problem for webcrawlers such as wget because they can end up duplicating some parts of a website many times and not reaching other parts of a website for a very long time. This is the problem I am having. Do any of you know the solution to this problem? Thank you very much. Kind regards Tim P.S. I am sorry if this a duplicate but I previously posted it before subscribing so I don't know if it went through.