On Mon, Nov 9, 2009 at 14:21, Paul Querna <[email protected]> wrote: >... > I agree in general, a serf-based core does give us a good start. > > But Serf Buckets and the event loop definitely do need some more work > -- simple things, like if the backend bucket is a socket, how do you > tell the event loop, that a would block rvalue maps to a file > descriptor talking to an origin server. You don't want to just keep > looping over it until it returns data, you want to poll on the origin > socket, and only try to read when data is available.
The goal would be that the handler's (aka content generator, aka serf bucket) socket would be process in the same select() as the client connections. When the bucket has no more data from the backend, then it returns "done for now". Eventually, all network reads/writes finalize and control returns to the core loop. If data comes in the backend, then the core opens and that bucket can read/return data. There are two caveats that I can think of, right off hand: 1) Each client connection is associated with one bucket generating the response. Ideally, you would not bother to read that bucket unless/until the client connection is ready for reading. But that could create a deadlock internal to the bucket -- *some* data may need to be consumed from the backend, processed, and returned to the backend to "unstick" the entire flow (think SSL). Even though nothing pops out the top of the bucket, internal processing may need to happen. 2) If you have 10,000 client connections, and some number of sockets in the system ready for read/write... how do you quickly determine *which* buckets to poll to get those sockets processed? You don't want to poll 9999 idle connections/buckets if only one is ready for read/write. (note: there are optimizations around this; if the bucket wants to return data, but wasn't asked to, then next-time-around it has the same data; no need to drill way down to the source bucket to attempt to read network data; tho this kinda sets up a busy loop until that bucket's client is ready for writing) Are either of these the considerations you were thinking of? I can certainly see some kind of system to associate buckets and the sockets that affect their behavior. Though that could get pretty crazy since it doesn't have to be a 1:1 mapping. One backend socket might actually service multiple buckets, and vice-versa. > I am also concerned about the patterns of sendfile() in the current > serf bucket archittecture, and making a whole pipeline do sendfile > correctly seems quite difficult. Well... it generally *is* quite difficult in the presence of SSL, gzip, and chunking. Invariably, content is mangled before hitting the network, so sendfile() rarely gets a chance to play ball. But if you really are just dealing with plain files (maybe prezipped), then the read_for_sendfile() should be workable. Most buckets can't do squat with it, and should just use a default function. But the file bucket can return a proper handle. (and it is entirely possible/reasonable that the signature should be adjusted to simplify the process) Cheers, -g
