Greg Stein wrote: > On Mon, Nov 9, 2009 at 14:21, Paul Querna <[email protected]> wrote: >> ... >> I agree in general, a serf-based core does give us a good start. >> >> But Serf Buckets and the event loop definitely do need some more work >> -- simple things, like if the backend bucket is a socket, how do you >> tell the event loop, that a would block rvalue maps to a file >> descriptor talking to an origin server. You don't want to just keep >> looping over it until it returns data, you want to poll on the origin >> socket, and only try to read when data is available. > > The goal would be that the handler's (aka content generator, aka serf > bucket) socket would be process in the same select() as the client > connections. When the bucket has no more data from the backend, then > it returns "done for now". Eventually, all network reads/writes > finalize and control returns to the core loop. If data comes in the > backend, then the core opens and that bucket can read/return data. > > There are two caveats that I can think of, right off hand: > > 1) Each client connection is associated with one bucket generating the > response. Ideally, you would not bother to read that bucket > unless/until the client connection is ready for reading. But that > could create a deadlock internal to the bucket -- *some* data may need > to be consumed from the backend, processed, and returned to the > backend to "unstick" the entire flow (think SSL). Even though nothing > pops out the top of the bucket, internal processing may need to > happen. > > 2) If you have 10,000 client connections, and some number of sockets > in the system ready for read/write... how do you quickly determine > *which* buckets to poll to get those sockets processed? You don't want > to poll 9999 idle connections/buckets if only one is ready for > read/write. (note: there are optimizations around this; if the bucket > wants to return data, but wasn't asked to, then next-time-around it > has the same data; no need to drill way down to the source bucket to > attempt to read network data; tho this kinda sets up a busy loop until > that bucket's client is ready for writing) > > Are either of these the considerations you were thinking of? > > I can certainly see some kind of system to associate buckets and the > sockets that affect their behavior. Though that could get pretty crazy > since it doesn't have to be a 1:1 mapping. One backend socket might > actually service multiple buckets, and vice-versa. > >> I am also concerned about the patterns of sendfile() in the current >> serf bucket archittecture, and making a whole pipeline do sendfile >> correctly seems quite difficult. > > Well... it generally *is* quite difficult in the presence of SSL, > gzip, and chunking. Invariably, content is mangled before hitting the > network, so sendfile() rarely gets a chance to play ball.
This brings us straight back to our discussions from 2000-01 timeframe when we discussed poll buckets. Pass it up as metadata that we are stalled on an event (at the socket, ssl, etc) - sometimes multiple events (ext_filter blocked and either needs to read more from the socket, or was blocked on its read, or now has something to write).
