Hi, Thanks for the bug report.
Wget's WARC support is rudimentary. And as of now, it only supports the older WARC/1.0 standard. Under the WARC 1.0 specification, the URI should be printed with the `<` and `>` characters. This was changed in the WARC/1.1 specification. Looks like the Wayback machine does not like WARC/1.0 style archives. I unfortunately cannot apply your patch as-is, since it would break compatibility with WARC/1.0. Sadly, while we've wanted to update the implementation to WARC/1.1, there hasn't been much interest in people wanting to contribute that code. On Thu, Oct 31, 2024, at 16:04, ferencz.mar...@icore.ro wrote: > Good afternoon, > > > > We had an issue with creating correct warc files with wget (even with the > latest one 1.24.5). The issue was caused by Wget saving the WARC-Target-URI > record with starting < and ending > characters. This could not be processed > by wayback machine on the replay. > > Reading the wiki, noticed that WARC-Target-URI should not contain <> > characters > > https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1 > .1/ > > So, I've updated the source of warc.c file with: > > > > //after > > static bool > > warc_write_header_uri (const char *name, const char *value) > > { > > if (value) > > { > > warc_write_string (name); > > warc_write_string (": <"); > > warc_write_string (value); > > warc_write_string (">\r\n"); > > } > > return warc_write_ok; > > } > > > > //added > > static bool > > warc_write_header_url (const char *name, const char *value) > > { > > if (value) > > { > > warc_write_string (name); > > warc_write_string (": "); > > warc_write_string (value); > > warc_write_string ("\r\n"); > > } > > return warc_write_ok; > > } > > > > Where I found WARC-Target-URI > > Like: warc_write_header_uri ("WARC-Target-URI", url); > > I've changed it to: > > warc_write_header_url ("WARC-Target-URI", url); > > > > This way the newly compiled wget did create good warc files. > > Maybe it could be included in the upcoming release. > > Thank you. > > Best regards, > > > Ferencz Marton > > > > > > > CEO, iCore Outsourcing SRL > Mobile: <tel:+40721275853> +40721275853 > Phone: <tel:+40368426655> +40368426655 > Email: <mailto:ferencz.mar...@icore.ro> ferencz.mar...@icore.ro > Website: https://www.icore.ro > Address: Str. Dr. Victor Babes Nr. 36 Birou 1.10 > 500073 Brasov Romania > > > > > > > Attachments: > * image003.png