Re: memcached: UDP and byte ranges

Dustin Sallings Tue, 18 Dec 2007 00:57:36 -0800


On Dec 17, 2007, at 23:15, Aaron Stone wrote:

- Do we need to separate the Request ID from the Message ID?

The purpose of the request ID is effectively to recreate the TCPsequence number. This just isn't necessary when your data areguaranteed to be deliver in order by TCP.

- Do we need to be able to request portions of a value starting
  from some offset? (to handle the now-infamous facebook's-udp-mgets
  are-so-fat, your-mamma's-ethernet-take-it-no-more!)

My only concern about this is that you may very well be requesting asection from a different value on a subsequent request.

- Do we need the server to tell the client how much data is about to
  show up?


        The message header already does that.

I _don't_ see a reason to have separate request id's from messageid's.The combination of a message id and packet number (or byte range,which
I'll get to in a moment) tell us everything we need to know.

It sounds like facebook (does anyone else even use the UDP basedprotocol?) already sends multiple messages in a single UDP request.This same thing happens over TCP. UDP is just a different transport,and needs the additional information to do what other transports doautomatically.

If we want the ability to request the n-th byte through the end, whynot

just ask for the n-th through m-th byte?

(yes, this is the byte range feature that we've all acknowledged is a
bad idea. except that it completely subsumes the functionality of the
UDP packet sequence number and does it even more powerfully.)

No, it's not the same. A UDP get still returns the whole value thesame way it does in TCP, except you have a bit more control over thepacketization. Retrieving a value by asking for a series of parts ofit can't be done atomically.

Add a field to the GET response akin to DNS's "there's more data butyouneed to ask for it". The first response packet will tell the clienthow
long the entire key is in an extras field, and the common header will
tell the client how long the data it got in the initial response is.


        It already does that.

Add a new command, RGET (range-get), that defines a larger extras
section with two additional fields, the offset and the length.

If this didn't use the CAS identifier, there'd be no guarantees thatit'd ever be right. If it did, you're left with the problem offinding out what the CAS identifier is.

The client is explicitly allowed to ask for more data than can fitin a
single UDP packet.

It already does, though. You just can't send more data than will fitin a UDP packet.

The server sends as many RGET response packets as it needs to send,witheach one containing enough information (offset and length) toreassemble
the value on the client _without resequenceing the packets_!

You can already do that. Once you receive the first packet, you knowhow many packets there are, what the total size is, and if you canassume all of the packets before the last one will be the same size,you can just fill in the value as the packets arrive.

Rationale:

By eliminating the packet sequence number, we save the client from
having to hold all the pieces in order until it can return the valueto
the client application.


        Hopefully that's unnecessary anyway.

By giving offsets in each packet, we avoid the potential problem of
losing the first packet and then being flooded with subsequent packets
that we don't know what to do with.

If that happens frequently, you should be using TCP and not trying toreinvent it.

Note that an rget is *not* a retransmit. If you're not very careful,you may get part of something unrelated to what the rest of thepackets represented. If you are careful, you still may end up havingto throw away all the other values.

Thoughts? Comments?

I really think it's better to either accept lossiness and generalsloppiness of a thin, dumb UDP transport or just use TCP and get allof the rest of the features handled for you by your OS vendor.


--
Dustin Sallings

Re: memcached: UDP and byte ranges

Reply via email to