Re: Race condition in wget

Hrvoje Niksic Fri, 01 Apr 2005 03:14:49 -0800

Behdad Esfahbod <[EMAIL PROTECTED]> writes:

> If I use the 1.8.2 version, although I get 100 different log files,
> but get only 14 index.html files.


And this was a bug, because those HTML files are likely to be both
overwritten and concurrently written to by, on average, 7.14 Wget
processes per file.

On my system and Wget 1.9.1, running your command results in 81 log
files and exactly one index.html.

> With the CVS, I see in the log that it's trying to find a
> nonexistence file:
>
> Connecting to behdad.org|217.160.226.67|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> index.html.21 has sprung into existence.
> Retrying.

The "sprung into existence" message is a compromise.  Wget "promises"
to the user that it would use a certain file name for saving the
output.  This file name is based on the URL and the existence of the
file *before the download starts*.  However, Wget opens the file only
later, when the data starts to arrive.  (This is to make sure the file
is not left hanging if the user interrupts the download.)

Now, if the file still exists when the data starts arriving, what is
Wget to do?  If it silently changes the file name, it's effectively
breaking the promise and misinforming the user.  If it writes to the
file anyway, it is corrupting data and possibly leaving itself
vulnerable to race conditions.

The compromise is to retry the download, using a different file name,
as indicated by the "sprung into existence" message.  Keep in mind
that a user will not see this message unless the file appears between
the time when the file name is printed and the time when the data
starts to arrive.  Outside test-case scenarios like the one you
presented, this is highly unlikely.

Re: Race condition in wget

Reply via email to