Hey everyone. In my recent work, I've had to do a lot of work with
protocols, layer 2 and up. This has me thinking about the beanstalk
protocol and evaluating it in a new light.  There are a couple of
changes I'd like to suggest, which break compatability, and should be
considered for v2 or some other far off point.

Without further ado, here are my ideas:

1. Framing: I think beanstalk should have a framing mechanism for its
messages.  Such a mechanism would simplify the parsing of messages by
a lot, and reduce some potential DOS situations (both intentional, and
more importantly accidental). More on that below. The reason I think
framing would work so well, is that beanstalk has 2 atomic units of
communication, the message and data. Data is already framed, since the
message describing the job (eg OK...) includes a job size argument.
So really all I am proposing is framing the messages.

The framing policy could be as simple as: [length of message] message
[\r\n]

Lenght of messages could be a binary in, a set of ascii hex values, or
even (fitting with the current protocol) numbers as ascii, howver
these last two would need to be a fixed length (so preceeded by 0 or
something) to get the benefits of framing.

So why does this matter? Well for a few reasons:
A. Simplicity of code -- In every client I've looked at, the
socket.recv code would be much simpler. In the case of a binary length
value framing, and a blocking client... it could be replaced with:

lenraw = socket.recv(4)
len = ntoh(lenraw)
messg = socket.recv(len+2)
if not messg.endswith('\r\n'): do_error()

This is avoids having to do reads in a while True... , waiting for a
condition that may never happen (\r\n) or a pre calculated max message
length.

2. Out of band communications... There are some states which we may
wish to communicate to clients as soon as possible. Examples include
Draining mode, Disk full, etc.  These could be included by way of
flags attatched to the protocol messages (responses) from beanstalkd.
It would avoid a full stats round trip message set, but would increase
the per message overhead.  It would also allow for finer grained
control for the clients.

3. Binary protocol. This one I'm not sure about. Im both for and
against it :)  The title explains it all, so instead I present pros
and cons:

Pros:
- fast
-compact (less bandwidth)
- not much different than already exists, it just enumerates
acceptable, instead of leaving them as text.

Cons:
- client writing is a bit more complex (parsing etc)
- not telnet debuggable (some will argue that writing a reference
telent like client is a good first step anyway)

4. Message batching.  By this I mean allowing more than one message
per packet. Im pretty sure this would be OK now, but I think it should
be explicitly stated. This allows for some performance tweaks,
particularly client end (when combined with "stacking").


Now the last three are all performance related.  They should be
carefully considered, since beanstalkd is already killer fast, and
we've heard here of memory related bounds, and bandwidth related
bounds, but not cpu related bounds. So maybe we need to evaluate
interms of those.

Overall Im not married to any of these of course, I just want to bring
the discussion up.  I will push pretty hard for #1 (framing) because
it simplifies my code :).

What do the rest of you guys think about this?

Regards,
Erich
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"beanstalk-talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/beanstalk-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to