Behdad Esfahbod <[EMAIL PROTECTED]> writes: > If I use the 1.8.2 version, although I get 100 different log files, > but get only 14 index.html files.
And this was a bug, because those HTML files are likely to be both overwritten and concurrently written to by, on average, 7.14 Wget processes per file. On my system and Wget 1.9.1, running your command results in 81 log files and exactly one index.html. > With the CVS, I see in the log that it's trying to find a > nonexistence file: > > Connecting to behdad.org|217.160.226.67|:80... connected. > HTTP request sent, awaiting response... 200 OK > Length: unspecified [text/html] > index.html.21 has sprung into existence. > Retrying. The "sprung into existence" message is a compromise. Wget "promises" to the user that it would use a certain file name for saving the output. This file name is based on the URL and the existence of the file *before the download starts*. However, Wget opens the file only later, when the data starts to arrive. (This is to make sure the file is not left hanging if the user interrupts the download.) Now, if the file still exists when the data starts arriving, what is Wget to do? If it silently changes the file name, it's effectively breaking the promise and misinforming the user. If it writes to the file anyway, it is corrupting data and possibly leaving itself vulnerable to race conditions. The compromise is to retry the download, using a different file name, as indicated by the "sprung into existence" message. Keep in mind that a user will not see this message unless the file appears between the time when the file name is printed and the time when the data starts to arrive. Outside test-case scenarios like the one you presented, this is highly unlikely.