My library uses a straight-forward reactor approach to handle incoming events (IO, timer etc). The library is structured as one-thread-per-core and as many co-routines (fibers) per thread as memory will allow. The threads communicate with each other via lock-free single-writer, single-reader FIFOs.
Each threadCore has it's own reactor with the only limitation that only the threadCore0 reactor (the default thread) can wait for signals / child processes. Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems. Performance suffers a bit, but you pick your platform and you takes your chances.
