Paul McCullagh wrote:
Hi Brian and Mark,

On Jul 16, 2008, at 1:03 AM, Brian Aker wrote:

So to me these need to be set:
   error= setsockopt(ptr->fd, SOL_SOCKET, SO_LINGER,
                     &linger, (socklen_t)sizeof(struct linger));
   error= setsockopt(ptr->fd, SOL_SOCKET, SO_SNDTIMEO,
                     &waittime, (socklen_t)sizeof(struct timeval));
   error= setsockopt(ptr->fd, SOL_SOCKET, SO_RCVTIMEO,
                     &waittime, (socklen_t)sizeof(struct timeval));
   (void)fcntl(ptr->fd, F_SETFL, flags | O_NONBLOCK);

What I don't quite understand is, why use non-blocking I/O?

This would be like polling the connection. But after setting a timeout
you shouldn't need to poll.

Another thing:

Have you considered using c10k and a pool of threads?

FYI, this is precisely what the MySQL Proxy does/solves.  Does it make
sense to do this in the microkernel itself or external to it?  Or both...

-jay

Had to sit on this for a while, but I've worked up some courage... Sorry this is so long. Haven't found a way to describe it more succinctly.

I went in a different direction with the network handling in DPM (my shameless mysql proxy clone). Also, I don't know how mysql proxy does this offhand.

- Logical packets are read/written into buffers.
- Buffers are *only* passed in for processing after a complete packet has been read.
- Buffers are *only* written to socket after processing has completed.
- Buffers are written to sockets as fast as the data is taken. As sockets become writable there's a short path to the "find more buffer data and flush" code.

This adds a little more memory management and a little more memory but reduces syscalls a great deal. In some part of my mind I believe it helps with avoiding context switches as well. MySQL will do blocking writes on individual logical packets...

It also means I can do weird things and avoid weird timeslicing from the OS:
- Write to buffers on many sockets (multiplexing, processing)
- Flush all sockets in a loop. (syscalls, no processing)

It's not patched into DPM yet, but I have some minor modifications on this algorithm to reduce memory usage a bit and reduce latency for receiving the response.

Obviously this can't work directly within mysql since you want to stream large results back to the client. In proxy-land you tend to read a chunk, write a chunk, then flush the write before going back to read more data. So you add a tiny bit of latency but batch syscalls. You also never end up flipping around setsockopt calls...

How this relates to drizzle/MySQL:

- Blocking writes aren't all that useful. Process as fast as you can and flush in the background. Especially with thread pooling, where you can complete processing with potentially many fewer context switches, and pick up the next incoming request via a message passing interface. The obvious exception are for large resultsets.

- Blocking reads aren't useful at all once we have a libevent layer. Pass complete request packets into a message queue, which are then picked up from the thread pool.

- This does require significant retooling so session contexts (half of THD?) are separate from the actual thread contexts (the read of THD). Each connection has a logical session, which gets passed to the processing thread when it handles the actual request. This brings in session variables, temp tables, blah blah blah.

- The complexity of this work is why I decided on a proxy instead and ended up writing DPM. I kind of view my initial work with production drizzle to actually use DPM as a plugin. DPM can then directly inject fully buffered queries, read results, etc.

The major tradeoff I make a conscious decision on how threads get reused... What query contexts are safe for connection pooling, and which client connections need a dedicated backend thread for the duration of their session. At that point I can actually implement a working 'RESET STATE' command by freeing the backend thread into the mysql thread pool, but keeping the client's tcp connection open. It's probably a little faster, at least..

Should drizzle's main IO still use libevent? Probably. The magic is in how we handle buffering of resultsets or incoming queries, and how threads are abstracted into a worker pool. Which all goes over my head right now. If we throw in libevent but still require alarms/blocking/sleeping everywhere there won't be much functional difference, and Mark Callaghan's blocking-with-timeouts design ends up being more efficient (less twiddling).

-Dormando

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to