Hey guys, late to the party, but I'll add my 2 cents. :)

I can appreciate the line of thought around framing + binary protocol
from a performance standpoint, but personally I question the time vs
benefit tradeoff (in terms of your own time). As it stands, you guys
have already made beanstalkd one of the fastest job scheduling systems
out there, and reliable! We've been running many instances of
beanstalk for almost a year now and never stumbled into speed as a
problem.

My thinking is: by definition beanstalkd is used to offload long
running tasks (it's all relative, but stay with me), which means that
the scheduling overhead is usually very low compared to actual
execution time. Hence, since beanstalk is already darn fast, I think
the actual utility of making it let's say 50% faster is marginally
tiny. Of course, I'm also not the one to pass up any performance
optimization - it's more a question of priorities.

So on that note, not to hijack the thread, but a few thoughts on what
I would love to see in beanstalkd:

1) New 'page-to-disk' option is awesome, as it gives us disaster
recovery, and now I'm wondering: what about persistence proper.
Namely, beanstalkd is pure memory at the moment, but what about
crossing that threshold? I realize that this is exactly orthogonal to
the performance track, but I would love to have beanstalk be able to
swap out large sets of jobs to disk once the memory is full. Perhaps
you can even use the current persister as a starting point.

Here's the use case: I have ~90 million jobs which need to be
scheduled, which adds up to ~30GB of memory. Now, this is not
outrageous, but it's definitely a non-trivial investment (one machine,
or a cluster). In reality though, most of those jobs are far away in
the future (+6 hours), so there is no reason why they should be kept
in memory.. If they could instead go to disk, and be brought into
memory as needed, that would be nice.

2) Named jobs. I've described the use case before (http://
groups.google.com/group/beanstalk-talk/browse_thread/thread/
902b13d5ee94983b/3252980f8caeb248?lnk=gst&q=named
+jobs#3252980f8caeb248) and I think a lot of people would benefit from
this.

cheers,
ig


On Jan 11, 5:06 pm, "Keith Rarick" <[email protected]> wrote:
> On Sun, Jan 11, 2009 at 1:42 PM, Erich <[email protected]> wrote:
> > The way I read it, multiple messages per packet is not explicitly
> > allowed or denied. Now, my understanding of tcp (and network
> > probgramming...) is that higher level layers shouldn't rely on per
> > packet stuff, but that's not always the case.
>
> Practically the only difference from server's point of view is how
> many bytes come in from a single read() call. Beanstalkd knows what to
> do if it reads more than one command at a time. (This can happen if
> two requests are in one packet or if two packets arrive before a
> read() call.)
>
> Clients can use TCP_CORK or something similar to affect how commands
> get split into packets, if they want the best performance.
>
> >   I assumed it would
> > work just fine, but being a pythonista, I like explicit :)
>
> Yeah. The relevant passage is, "For each connection, the server
> processes commands serially in the order in which they were received
> and sends responses in the same order". This statement has no
> qualifications, and it applies regardless of how the bytes of those
> commands are partitioned into packets or the timing of those packets.
>
> I added that sentence after a discussion on the mailing list a while
> ago on this same point, but clearly the description still doesn't go
> far enough.
>
> kr
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"beanstalk-talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/beanstalk-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to