Hi, This post is to suggest a new feature for wget: an option to pre-allocate disk space for downloaded files. (Maybe have a --pre-allocate command-line option?)
The ability to pre-allocate space for files would be useful for a couple of reasons: - By pre-allocating all space before downloading, the risk of exiting due to a disk-full error is avoided. When downloading from a server which doesn't support resuming downloads, an accidental disk full condition means you have to re-download the whole file after freeing up some disk space. That wastes a lot of time and network bandwidth. - Disk fragmentation can be reduced. Downloading large files can take many hours. While wget is downloading, much other disk activity can be caused by other programs (web browser cache, email client etc.). The result is the wget output file can end up unnecessarily fragmented. And likewise, files written by other programs while wget is running end up more fragmented. On Linux, fallocate() and posix_fallocate() can be used to pre-allocate space. The advantage of fallocate() is that, by using the FALLOC_FL_KEEP_SIZE flag, space is allocated but the apparent file size is unchanged. That means resuming with --continue works as normal. posix_fallocate() on the other hand, sets the file length to its full size, meaning that --continue won't work unless there were some way to specify the byte offset that wget should continue from. The fallocate program (see "man 1 fallocate") can be used to manually pre-allocate space. For a single file that's a slight hassle but simple enough. (Run wget to determine file length, break, use fallocate to allocate space, then re-run wget.) But when using wget to download many files in one session it's not really practical. Of course, if the web server does not report the file size, it won't be possible to pre-allocate space. Or would it...? Suppose the user is downloading some CD ISO images from a server which does not report file lengths. If the user could tell wget to pre-allocate 800MB for each file, and then have wget call ftruncate() when each file has finished downloading, that should achieve a result almost as good as if the server did report file lengths. -- Mark
