On Dec 17, 2007, at 23:15, Aaron Stone wrote:

- Do we need to separate the Request ID from the Message ID?

The purpose of the request ID is effectively to recreate the TCP sequence number. This just isn't necessary when your data are guaranteed to be deliver in order by TCP.

- Do we need to be able to request portions of a value starting
  from some offset? (to handle the now-infamous facebook's-udp-mgets
  are-so-fat, your-mamma's-ethernet-take-it-no-more!)

My only concern about this is that you may very well be requesting a section from a different value on a subsequent request.

- Do we need the server to tell the client how much data is about to
  show up?

        The message header already does that.

I _don't_ see a reason to have separate request id's from message id's. The combination of a message id and packet number (or byte range, which
I'll get to in a moment) tell us everything we need to know.

It sounds like facebook (does anyone else even use the UDP based protocol?) already sends multiple messages in a single UDP request. This same thing happens over TCP. UDP is just a different transport, and needs the additional information to do what other transports do automatically.

If we want the ability to request the n-th byte through the end, why not
just ask for the n-th through m-th byte?

(yes, this is the byte range feature that we've all acknowledged is a
bad idea. except that it completely subsumes the functionality of the
UDP packet sequence number and does it even more powerfully.)

No, it's not the same. A UDP get still returns the whole value the same way it does in TCP, except you have a bit more control over the packetization. Retrieving a value by asking for a series of parts of it can't be done atomically.

Add a field to the GET response akin to DNS's "there's more data but you need to ask for it". The first response packet will tell the client how
long the entire key is in an extras field, and the common header will
tell the client how long the data it got in the initial response is.

        It already does that.

Add a new command, RGET (range-get), that defines a larger extras
section with two additional fields, the offset and the length.

If this didn't use the CAS identifier, there'd be no guarantees that it'd ever be right. If it did, you're left with the problem of finding out what the CAS identifier is.

The client is explicitly allowed to ask for more data than can fit in a
single UDP packet.

It already does, though. You just can't send more data than will fit in a UDP packet.

The server sends as many RGET response packets as it needs to send, with each one containing enough information (offset and length) to reassemble
the value on the client _without resequenceing the packets_!

You can already do that. Once you receive the first packet, you know how many packets there are, what the total size is, and if you can assume all of the packets before the last one will be the same size, you can just fill in the value as the packets arrive.

Rationale:

By eliminating the packet sequence number, we save the client from
having to hold all the pieces in order until it can return the value to
the client application.

        Hopefully that's unnecessary anyway.

By giving offsets in each packet, we avoid the potential problem of
losing the first packet and then being flooded with subsequent packets
that we don't know what to do with.

If that happens frequently, you should be using TCP and not trying to reinvent it.

Note that an rget is *not* a retransmit. If you're not very careful, you may get part of something unrelated to what the rest of the packets represented. If you are careful, you still may end up having to throw away all the other values.

Thoughts? Comments?


I really think it's better to either accept lossiness and general sloppiness of a thin, dumb UDP transport or just use TCP and get all of the rest of the features handled for you by your OS vendor.

--
Dustin Sallings



Reply via email to