wget --warc-file=httpbin -qO- https://httpbin.org/get


How to convert the warc format to the actual header of requests and responses?
Greetings
WARC is gzipped plain text.

wget --warc-file=httpbin --no-warc-compression -qO response.raw -- https://httpbin.org/get

Extract headers with GNU Sed
sed -n -r -e "/WARC-Type: (request|response)/{s/.*: (.)/\n\L\1/;p;:a;N;s/\n$//;Ta;s/.*//;:b;N;s/\n$//;Tb;p;}" httpbin.warc > headers.txt

Extract headers with GNU AWK
awk "{if(/WARC-Type: (response|request)/){print n;hp=1;np=0;}if(hp){if(np){if(!$1){np=0;hp=0;}else print}if(!np&&!$1)np=1;}}" httpbin.warc > headers.txt


Best regards.

Reply via email to