Hello, Wget can be used to DL from archive.org, but larger sites will not work because archive.org stops allowing access after a certain amount of time (After a few days).
Also, archive.org recently changed their policy so that if you timeout a URL, then that URL may not be accessed again for the whole day. This breaks wget terribly. I hope to write them about this. Wget2, as I think I meantioned in the list already, changes the behavior from wget and grabs waaay too much. I wanted to fix this bug, but have yet to code it (sorry!) But assuming you can put up with these difficulties, here's what you need: wget -NEkrlXXX -t XXX --timeout XXX --reject-regex 'http.*http.*http|\.html?.*\.html?.*\.html?|www\..*www\..*www\.' --accept-regex '(.*\.(css|gif|png|jpe?g|webp|svg)$|https?://web\.archive\.org/web/[^ *]+/https?://?(i0.wp.com|i[0-9].wp.com|s[0-9].wp.com|([0-9]\.)?bp.blogspot.com|www.blogger.com|www.blogblog.com|lh[0-9]\.googleusercontent.com|fonts.googleapies.com|(ssl|www|fonts).gstatic.com|(www[0-9]*?\.)?URL))' 'URL' (You'll need to eliminate the new lines in the above text.) N for timestamps, which are needed most of the time. E to change the extention of the file, which is necessary far too often. k to convert the URLs. rl for recusive and how far to go. The reject regex is minimal, it just prevents recursive downloading of other sites -- you'd be surprised how many times this has to be used. The accept-regex ensures that wget stays on the straight and narrow path of only getting the side and the page-requires from other sites, such as images or sites like wp, blogspot, etc.. You'll have to change XXX and URL to whatever number you think is best for XXX and the URL you're using for URL minus the www and http/https portions. You're welcome, David