I did some testing myself, working on something similar, though I still don't know what people prefer: that I hack init, or replace /etc/rc.d/rc by a binary...
Is the problem that the necessary initializations take too long, or that they take too long when run serially under a single thread ?
