Follow-up Comment #1, bug #66468 (group wget): No matter what I tried, I was unable to prevent wget from erasing at least some of my existing 20,000 files while --no-clobber was being used. Wget overwrites about 5% of them.
I ended up working around the bug by writing my own URL looping routine in Bash that loads the URLs from file into an array, then loops over the URLs, grabs the last part of the URL to use as the output filename, skips output files which already exist with data in them. The downside to doing it this way is wget is reopened 20,000 times, and connects to the remote server 20,000 times because a Keep-Alive connection cannot be used. It's slower and puts more strain on the remote server, but it never erases local files. I think silently erasing possibly irreplaceable local data files when being _explicitly told not to touch them_ is a really major bug that should be addressed! A simple example of where this can cause mass destruction is someone archiving pages on a web site as soon as new pages appear, but never wanting any future updates to those pages to show up (including the page being removed). As it is, wget can randomly and silently erase these locally archived files despite using --no-clobber. Very dangerous. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66468> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature