@Yichao Yu: Sure. I'm aware that there is a lot of overhead in the inter process communication. This was just a minimal test case, not something I expect to run fast(er). (A silly nworker() implementation indeed;-)) I was just curious if it is possible to reduce the type instability showing up in parallel programming situations like this (and others). One reason is that such a type instability seem to "mask" @time results (number of allocations and memory usage is typically much larger for type unstable code, right?), making it harder to get a quick idea of the performance/type stability of parts of the code that really matters...
@Richard Dennis: That's interesting! Do you (or anyone else) have an idea why? What is updated/changed in 0.5.0-rc4 compared to 0.4.6?