Re: mod_firehose and pcap format

Graham Leggett Tue, 12 Feb 2013 08:30:35 -0800

On 12 Feb 2013, at 5:31 PM, Thomas <[email protected]> wrote:

> Looking at mod_firehose from trunk, is there any effort going on or already 
> concluded on converting the output of mod_firehose or it's parser program 
> firehose to the pcap format ? I know it was shortly discussed during the 
> mod_firehose integretation proposal but I have not seen any result there. I 
> realize mod_firehose actually aims for something simpler then a full blown 
> tcpdump/wireshark compatible dump but it would still be neat to be able to do 
> it.



Firehose aims to give you a view of requests inside an HTTP stream rather than 
packets over a wire, and aims to allow you to see the different buckets as they 
were recorded, but also gives enough information to reconstruct the original 
requests and responses back into a usable form.

Firehose was designed for an extremely high load environment, where it is more 
important to deliver the response to the audience at GBE and 10GBE than it is 
to wait for disks and processes to record the firehose packet to a pipe or 
file. Firehose may drop buckets, and this has to be detectable by the 
application reading the raw firehose. It was used to detect "one in a billion" 
request failures, where live traffic was recorded until the problem could be 
found, and then the original traffic could be "played back" to determine if the 
bug was fixed. We were dealing with hundreds of gigabytes of recorded request 
data that was analysed directly on live servers (that volume of data is 
completely impractical to copy around), thus the aim for efficiency in 
processing.

The pcap format (as I've read it in the past) just captures packet streams, 
there is no relationship captured between packets. This maps well into the 
world of packet based networking, but not into the world of streams, which HTTP 
is. Firehose cares that a bucket has been dropped, while pcap doesn't (by 
design, packet based networks drop packets).

Software that analyses pcap files expects to find network packets inside. While 
I could fake a TCP packet encapsulated in pcap, lots of questions emerge, do I 
fake TCP retransmission behaviour if firehose drops a bucket? Do I fake IPv4 or 
IPv6? How do I map the recording of requests to a fake TCP stream when multiple 
requests can run over the same connection? You reach a point where using a 
"designed for packets" encapsulation format works too hard against you when 
you're recording streams with potential holes in it, and you care where the 
holes are, and humans want to read this too.

The firehose format as it stands now is an extension to chunked encoding. A 
single line gives the length and additional parameters, followed by the binary 
chunk, followed by CRLF. The additional parameters give you the number in the 
sequence (allowing you to detect dropped buckets in the stream), and a UUID 
allowing you to reconstitute either a request or a connection. The current 
format is also human readable, which you will want to do if you care about the 
buckets being sent over the wire and whether they are excessively fragmented. 
The size of each bucket is carefully controlled to ensure that it can be 
written to a pipe atomically, which is why even if httpd sends an 8000 byte 
bucket, firehose will read fewer bytes to ensure a fit, and mod_firehose cares 
if pipes are involved and nobody is listening, the show must go on whether 
firehose works or not.

Regards,
Graham
--

smime.p7s
Description: S/MIME cryptographic signature

Re: mod_firehose and pcap format

Reply via email to