On Sun, 8 Jun 2014 10:35:30 +0530 Darshit Shah <[email protected]> wrote:
> On Sun, May 11, 2014 at 11:28 PM, Ángel González <[email protected]> > wrote: > > On 07/05/14 23:46, Jure Grabnar wrote: > >> > >> Hello, > >> > >> I wrote two patches regarding this issue based on your suggestions. > >> > >> The 1st one is crucial for retrieve_from_file() function: it fixes > >> 2 memory corruption bugs. > >> > >> The 2nd patch is more of an experiment than a real deal and is > >> intended to get some feedback. It changes parallel download to one > >> temporary file which is located in the selected directory. > >> > >> Before download starts it uses posix_fallocate() to allocate space > >> and mkstemp() for unique name. After download is completed it uses > >> rename() to rename temporary file to the final file name. > >> > >> After posix_fallocate() each thread opens file with fopen("wb"). > > > > You could use w+b, even though you're not going to read from it. > > > > > >> I opted for fopen() rather than messing around with file > >> descriptors because I believe it's more portable. I don't know how > >> Windows would react to file descriptors and I don't have a proper > >> Windows system to test it out. > > > > It works fine. > > On Windows, FILE* are a layer on top of fds, which are themselves a > > layer over HANDLEs. To > > make things more complex, gnulib provides a different abstraction > > to wget. But it should work. The only special bit would be the need > > to add O_BINARY, which > > gnulib should already be doing for you. > > > > > > > >> Now, fopen("wb") means file, which was fallocate'd, is > >> truncated to zero but after first request from the thread, which is > >> reponsible for the last chunk, it would grow back to at least > >> file_size > >> - chunk_size. I'm also not sure how devastating that is. > > > > It's up to the filesystem, but I think it would be better to do > > open (or dup) + fdopen() > > + fseek rather than the fopen(, "wb"); It also allows you to > > dispense with the barrier. > > > > > > > >> I'm attaching a handmade Metalink file which downloads a 50MB file > >> for testing purposes. Currently all threads connect to the same > >> server and I understand we don't support such behaviour but I > >> guess 2-3 threads for testing purpose don't hurt anyone. :) > >> > Does anyone have any objections to the above patches? Else we can > merge them. I'm sending updated 2nd patch. It uses fopen(,"r+b") and doesn't need barrier. I tested quite a bit and it works ok. The only problem comes when number of thread is >=8, the program sometimes crashes. I tried debugging it with gdb and valgrind but without avail - it doesn't crash (Heisenbug). Through core dump I found out crash is quite random (probably race condition?). It's either accessing free'd pointer or freeing already free'd pointer. I don't want to spend too much time fixing it right now, because when downloading of a single file is done, downloading Metalink should migrate to it aswell (this bug might be gone then). I believe this patch is not connected to the bug though, because Wget crashes even if I use the original code. Best Regards, Jure Grabnar
>From a15b4f08efdf59471c45d8f322c72248d75ebd54 Mon Sep 17 00:00:00 2001 From: Jure Grabnar <[email protected]> Date: Sun, 8 Jun 2014 08:08:38 +0200 Subject: [PATCH] Download to single temporary file. --- src/ChangeLog | 14 ++++++++++ src/http.c | 10 ++++--- src/multi.c | 87 ++++++++++++++++++++++++++--------------------------------- src/multi.h | 11 ++++---- src/retr.c | 18 ++++++------- 5 files changed, 74 insertions(+), 66 deletions(-) diff --git a/src/ChangeLog b/src/ChangeLog index 55a1278..fd76bb1 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,5 +1,19 @@ 2014-05-07 Jure Grabnar <[email protected]> + * multi.c: Parallel download is now stored in one temporary file rather than + multiple files. + (SUFFIX_TEMP): Define. + (name_temp_files): Use function mkstemp() instead of tmpnam() which is safer + and allows for customized path. + (init_temp_files, name_temp_files, delete_temp_files, clean_temp_files): + Rewritten to work with one file. + (merge_temp_files): Remove. + (rename_temp_file): Add. + * retr.c (retrieve_from_file): Change code to work with one temporary file. + * http.c (gethttp): Likewise. + +2014-05-07 Jure Grabnar <[email protected]> + * multi.c: Add condition to fix memory corruption and downloading in parallel in general. * retr.c: Increase buffer size by 1 for '\0' to avoid memory corruption. diff --git a/src/http.c b/src/http.c index 388530c..d44530e 100644 --- a/src/http.c +++ b/src/http.c @@ -150,7 +150,6 @@ struct request { }; extern int numurls; - /* Create a new, empty request. Set the request's method and its arguments. METHOD should be a literal string (or it should outlive the request) because it will not be freed. ARG will be freed by @@ -2784,7 +2783,7 @@ read_header: REGISTER_PERSISTENT_CONNECTION (4); return RETRUNNEEDED; } - else if (!ALLOW_CLOBBER) + else if (!ALLOW_CLOBBER && !opt.metalink_file) { char *unique = unique_name (hs->local_file, true); if (unique != hs->local_file) @@ -3167,7 +3166,12 @@ read_header: #endif /* def __VMS [else] */ /* Open the local file. */ - if (!output_stream) + if (opt.jobs > 1) + { + fp = fopen (hs->local_file, "r+b"); + fseek (fp, hs->restval, SEEK_SET); + } + else if (!output_stream) { mkalldirs (hs->local_file); if (opt.backups) diff --git a/src/multi.c b/src/multi.c index 43c2f73..665b50e 100644 --- a/src/multi.c +++ b/src/multi.c @@ -36,90 +36,79 @@ as that of the covered work. */ #include <pthread.h> #include <semaphore.h> #include <unistd.h> +#include <fcntl.h> #include "multi.h" #include "url.h" + +/* Suffix for temporary files. Last 6 chars must be 'X' because of mkstemp(). */ +#define SUFFIX_TEMP ".tmp.XXXXXX" + static struct range *ranges; -char **files; +char *main_file; /* Allocate space for temporary file names. */ void -init_temp_files() +init_temp_files(char *file_name) { - int i; + int alloc_size = (opt.dir_prefix ? strlen (opt.dir_prefix) + (sizeof "/") : 0) + + strlen (file_name) + (sizeof SUFFIX_TEMP) + 1; - if(!(files = malloc (opt.jobs * (sizeof *files)))) + if(!(main_file = malloc (alloc_size))) { - logprintf (LOG_VERBOSE, "Space for temporary file data could not be allocated.\n"); + logprintf (LOG_VERBOSE, "Space for temporary file names could not be allocated.\n"); exit(1); } - for (i = 0; i < opt.jobs; ++i) - if(!(files[i] = malloc (L_tmpnam * sizeof(char)))) - { - logprintf (LOG_VERBOSE, "Space for temporary file names could not be allocated.\n"); - exit(1); - } } /* Assign names to temporary files to be used. */ void -name_temp_files() +name_temp_files(char *file_name, long long int file_size) { - int i; + int fd; - for (i = 0; i < opt.jobs; ++i) - if(!tmpnam(files[i])) - { - logprintf (LOG_VERBOSE, "Temporary file name could not be assigned.\n"); - exit(1); - } + if(opt.dir_prefix) + sprintf(main_file, "%s/%s%s", opt.dir_prefix, file_name, SUFFIX_TEMP); + else + sprintf(main_file, "%s%s", file_name, SUFFIX_TEMP); + + if(!(fd = mkstemp (main_file))) + { + logprintf (LOG_VERBOSE, "Temporary file name could not be assigned.\n"); + exit(1); + } + + if (posix_fallocate(fd, 0, file_size)) + { + logprintf (LOG_VERBOSE, "File could not be allocated.\n"); + exit(1); + } + close (fd); } -/* Merge the temporary files in which the chunks are stored to form the - resulting file(output). */ +/* Rename the temporary file used to the final file name. */ void -merge_temp_files(char *output) +rename_temp_file (char *new_file_name) { - FILE *out, *in; - int j, ret; - void *buf = malloc (MIN_CHUNK_SIZE); + rename (main_file, new_file_name); - out = fopen (output, "wb"); - for(j = 0; j < opt.jobs; ++j) - { - in = fopen(files[j],"rb"); - ret = MIN_CHUNK_SIZE; - while(ret == MIN_CHUNK_SIZE) - { - ret = fread(buf, 1, MIN_CHUNK_SIZE, in); - fwrite(buf, 1, ret, out); - } - fclose(in); - } - fclose(out); - free(buf); + free (main_file); + main_file = xstrdup (new_file_name); } /* Delete the temporary files used. */ void delete_temp_files() { - int j = 0; - - while(j < opt.jobs) - unlink(files[j++]); + unlink (main_file); } /* Clean the space allocated for temporary files data. */ void clean_temp_files() { - int i; - - for (i = 0; i < opt.jobs; ++i) - free (files[i]); - free(files); + free (main_file); } /* Allocate ranges array to store the ranges data. */ @@ -188,7 +177,7 @@ spawn_thread (struct s_thread_ctx *thread_ctx, int index, int resource) if(!thread_ctx[index].url_parsed) return 1; - thread_ctx[index].file = files[index]; + thread_ctx[index].file = main_file; thread_ctx[index].range = ranges + index; (thread_ctx[index].range)->is_assigned = 1; (thread_ctx[index].range)->resources[resource] = true; diff --git a/src/multi.h b/src/multi.h index ad9bd21..2c28a36 100644 --- a/src/multi.h +++ b/src/multi.h @@ -32,6 +32,7 @@ as that of the covered work. */ #define MULTI_H #include <semaphore.h> +#include <pthread.h> #include "wget.h" @@ -62,15 +63,15 @@ struct s_thread_ctx uerr_t status; }; -void init_temp_files(); +void init_temp_files(char *); -void name_temp_files(); +void name_temp_files(char *, long long int); -void merge_temp_files(char *); +void rename_temp_file (char *); -void delete_temp_files(); +void delete_temp_files (); -void clean_temp_files(); +void clean_temp_files (); void init_ranges(); diff --git a/src/retr.c b/src/retr.c index 2f45fa5..35f500c 100644 --- a/src/retr.c +++ b/src/retr.c @@ -1087,7 +1087,7 @@ retrieve_from_file (const char *file, bool html, int *count) elect_resources (mlink); elect_checksums (mlink); - init_temp_files(); + init_temp_files(mlink->files->name); init_ranges (); thread_ctx = malloc (opt.jobs * (sizeof *thread_ctx)); @@ -1108,9 +1108,10 @@ retrieve_from_file (const char *file, bool html, int *count) if (j < opt.jobs) opt.jobs = j; - name_temp_files (); + name_temp_files (file->name, file->size); sem_init (&retr_sem, 0, 0); + j = ranges_covered = 0; resource = file->resources; @@ -1255,10 +1256,10 @@ retrieve_from_file (const char *file, bool html, int *count) sprintf(file_path, "%s/%s", opt.dir_prefix, file->name); else sprintf(file_path, "%s", file->name); - mkalldirs(file_path); - merge_temp_files(file_path); + + rename_temp_file (file_path); res = verify_file_hash(file_path, file->checksums); - free(file_path); + free (file_path); if(!res) { ++*count; @@ -1274,13 +1275,12 @@ retrieve_from_file (const char *file, bool html, int *count) logprintf (LOG_VERBOSE, _("Retrying to download(%s). (TRY #%d)\n"), file->name, ++retries + 1); + delete_temp_files(); continue; } } } - delete_temp_files(); - clean_range_res_data(); if (opt.quota && total_downloaded_bytes > opt.quota) { @@ -1292,7 +1292,6 @@ retrieve_from_file (const char *file, bool html, int *count) free(thread_ctx); clean_ranges (); - clean_temp_files (); delete_mlink(mlink); } else @@ -1481,7 +1480,8 @@ rotate_backups(const char *fname) sprintf (from, "%s%s%d", fname, SEP, i - 1); rename (from, to); } - + sprintf (to, "%s%s%d", fname, SEP, 1); + rename(fname, to); sprintf (to, "%s%s%d", fname, SEP, 1); rename(fname, to); } -- 2.0.0
