Hi, attached you can find a patch that proposes a change to the file warc.c. The change will use url_escape to escape reserved characters in the redirect_location. Up to the current version (1.19) wget (with warc and warc-cdx flags) will write the redirect_location unescaped. If that contains whitespaces (e.g. unescaped error messages or oauth scope information) it is nearly impossible to parse as wget uses whitespaces as field separators.
The sample cdx writer published by internetarchive ( https://github.com/internetarchive/CDX-Writer) also uses url encoding on the redirect_location. Best Regards Christof Horschitz
--- warc.c 2016-09-07 11:35:24.000000000 +0200 +++ warc_new.c 2017-03-22 08:32:28.395540715 +0100 @@ -32,6 +32,7 @@ #include "utils.h" #include "version.h" #include "dirname.h" +#include "url.h" #include <stdio.h> #include <stdlib.h> @@ -1365,6 +1366,8 @@ mime_type = "-"; if (redirect_location == NULL || strlen(redirect_location) == 0) redirect_location = "-"; + else + redirect_location = url_escape(redirect_location); number_to_string (offset_string, offset);