Tim Rühsen <tim.ruehsen <at> gmx.de> writes: > Unzipping it and zipping it again results in a 2387 byte file. > > So, for a first glimpse, it looks like Wget compresses very suboptimal. > But I won't say it is a bug before I take a deeper look... (in the next days).
That's probably working as intended. By conventions, warc.gz files use concatenated GZip records, rather than a single GZipped stream, so that individual items can be recovered via their byte offset. This is allowed by the GZip spec, but not widely known or used, which causes much confusion. I rather wish the spec. had defined some other file extension for this case. Thanks, Andy
