On Wed, Sep 15, 2010 at 9:04 AM, Andrew Stitcher <astitc...@redhat.com> wrote: > On Tue, 2010-09-14 at 12:39 -0700, a fabbri wrote: <snip> >> >> We could just use pthread_spin_lock() around your state transition >> case statement, instead of spinning on the primitive compare and swap. >> Not sure how much more readable that would be, but it happens to be >> more my style. ;-) > > It's not clear to me this would necessarily avoid the contention, but it > is certainly worth thinking about. >
Let me try to clarify. Was the performance problem you were seeing before something like: A. Before your boolCompAndExchange stuff, you used a scoped mutex which is a pthread_mutex under the covers B. pthread_mutexes put threads to sleep when there is contention, which ends up hurting performance because your critical sections are trivial (small with no sleeping or syscalls inside) Is that about right? Evidence of B would have been something like oprofile output which showed kernel mutex activity. (Linux userspace mutexes don't enter the kernel unless there is contention). Adaptive spin/sleep mutex implementations spin for a little while before sleeping to avoid this problem. These have existed places like the FreeBSD kernel for a long time, but I don't think the linux pthread_mutex (userspace) implementation has this yet. In this case, if you know your critical sections are "trivial", you can use spinlocks. My impression is that you've implemented your own sort of spinlock with the do { } while (comp_exchange) idiom, but a pthread_mutex may be slightly more readable. I'd expect both to perform similarly. Hope that is clearer. >> <snip> >> >> > The entire purpose of this state machine is to arbitrate between calls >> > to dataEvent() and notifyPendingWrite() which can happen on different >> > threads at the same time. >> >> Segue to a related question I have... Can you help me understand, or >> just point to docs/code, the threads involved here? >> >> The upper layers ("app") call notifyPendingWrite() from whatever >> thread they want. dataEvent() gets called from the poller thread. Is >> it correct that there is typically only one poller thread per >> Blah::AsynchIO instance? > > No, the IO threads are entirely independent of the number of > connections. The rule is something like 1 IO thread per CPU (this needs > to be revisited in the light of the NUMA nature of current multicore, > multisocket machines). Thanks for clarification. Can you point me to where in the code these threads are spawned? > The IO threads all loop in parallel doing something like: > > Loop: > Wait for IO work to do > Do it. All threads wait (select or epoll) on the same set of file descriptors, right? Doesn't this mean that all IO threads race to service the same events? That is, all N threads wake up when an fd becomes readable? In the linux/epoll case, does using EPOLLONESHOT mean that only one thread gets woken up, or that all wake up, but they only wake up once until the fd is rearmed. (Didn't see this info in man pages). > The upper layer threads can be entirely different threads (in the case > of a client) or in the case of the broker an arbitrary IO thread > (possibly one currently processing an event from this connection, > possibly processing another connection). The broker does nearly all its > work on IO threads. The exceptions are Timer events which are at least > initiated on their own thread, and I think some management related work. > >> >> Where does the --worker-threads=N arg to the CPP qpidd broker come into play? > > This overrides the default selection of number of IO threads. > >> >> Finally--perhaps a can of worms-- but why does notifyPendingWrite() >> exist, instead of just writeThis(). Is this part of the "bottom-up >> IO" design? I feel like having the app tell us it wants to write (so >> call me back) is more complex than just having a writeThis(buf) >> method. > > It currently works like this to ensure that the actual writes happen > correctly serialised to the connection processing, ie when the callback > for "please write something" happens we can be sure that nothing else is > happening on the connection. Humm. You can serialize writes either way, right? Just put them in a queue (or return an error if connection is down). Maybe I'm missing the point. It seems like: aio->notifyPendingWrite() callback idle() idle calls queueWrite() if queueWrite() cannot post send, it calls full() callback Could be simplified as aio->queueWrite() with some changes in semantics and/or the introduction of a queue of outgoing-but-not-posted sends Thanks again, Aaron --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org