Marc from facebook emailed a few comments to me about the binary protocol documentation that I've been working on, and got me thinking more about integrating UDP support into the core protocol (much to Marc's chagrin, I think).
Here are some open issues with UDP (some from Marc today, and some from the hackathon at Yahoo! when we were talking about the Message ID): - Do we need to separate the Request ID from the Message ID? - Do we need to be able to request portions of a value starting from some offset? (to handle the now-infamous facebook's-udp-mgets are-so-fat, your-mamma's-ethernet-take-it-no-more!) - Do we need the client to tell the server how big a receive window it has? - Do we need the server to tell the client how much data is about to show up? Some thoughts: I _don't_ see a reason to have separate request id's from message id's. The combination of a message id and packet number (or byte range, which I'll get to in a moment) tell us everything we need to know. If we want the ability to request the n-th byte through the end, why not just ask for the n-th through m-th byte? (yes, this is the byte range feature that we've all acknowledged is a bad idea. except that it completely subsumes the functionality of the UDP packet sequence number and does it even more powerfully.) If the client can discover the size of the data, then it can control the receive window by asking for byte ranges. Proposal: Add a field to the GET response akin to DNS's "there's more data but you need to ask for it". The first response packet will tell the client how long the entire key is in an extras field, and the common header will tell the client how long the data it got in the initial response is. Add a new command, RGET (range-get), that defines a larger extras section with two additional fields, the offset and the length. The client is explicitly allowed to ask for more data than can fit in a single UDP packet. The server sends as many RGET response packets as it needs to send, with each one containing enough information (offset and length) to reassemble the value on the client _without resequenceing the packets_! If the client needs to rate limit the response, it can send separate RGET requests with each one asking for some length of data that the client can handle at that moment. Rationale: By eliminating the packet sequence number, we save the client from having to hold all the pieces in order until it can return the value to the client application. By giving offsets in each packet, we avoid the potential problem of losing the first packet and then being flooded with subsequent packets that we don't know what to do with. We also give the ability to set up a receive buffer in some size as indicated in the first packet and then blindly stuff the subsequent packet values into the right locations in this buffer. If one chunk is missing, it can be specifically re-requested, too. Yes, if the data-check mismatches during this operation, you've got to start asking all over again, but I think that's a fundamental problem to asking for a large value over UDP that might require re-requests (as opposed to TCP handling the re-transmit for you). I believe this all to be completely stateless on the server. Thoughts? Comments? I'm going to draw up some protocol pictures this week to show what this might look like. Aaron
