Andy, Sorry for the late reply.
> I've got a quick question on the creation of the dataflow graph in > HPX. Zach reported yesterday that he can't run dgswem with a really > large number of time steps on top of LGD/HPX, but the code works fine > with just a couple of time steps. We create the complete dataflow > graph before starting the simulation and my suspicion is that this > simply fails for a couple million time steps. So my question is > how/when should we fill the dataflow graph? > > I assume we should have some kind of iterative procedure for that, so > that we add a certain number of timesteps with a certain look ahead > distance to the graph, but I'm not sure which parameters make sense. > > For instance, if we assume a total of t timesteps, then right now our > chunk size c is t and our look ahead distance l is 0. But we could > also fill in 1000 timesteps up front and then add another step in each > cycle (l = 1000, c = 1). Or some mixture of this, e.g. fill in 1000 up > front and then add another 100 every 100 steps. I don't know. This > already reeks of over engineering. Any ideas? I think the solution lies not in defining an artificial limit on when to stop generating the dependency tree. The problem is caused by the fact that HPX (for historical - and now efficiency - reasons) uses the so called 'child-stealing' (or continuation-stealing) as its default mechanism. Continuation-stealing means that at the point when a new thread is created, the system continues executing the current (parent) thread on the current core, leaving the child (continuation) to be work-stolen by other cores. Continuation-stealing is known to be prone to resource oversubscription. On the other hand, parent-stealing means that the current core continues by executing the newly created thread immediately, leaving the parent (original) thread to be work-stolen by other cores. Parent-stealing has been proven to allow to limit resource utilization as it prefers to execute work instead of creating new work. As said, HPX performs continuation stealing by default. We have implemented parent stealing as well, but not as efficiently as you'd like to do as an implementation of true parent stealing requires compiler support. In HPX you can force parent-stealing by passing hpx::launch::fork as the first argument (the launch policy) to async, dataflow, future::then, etc. The implementation we have incurs one additional HPX-thread context switch per new thread (~400ns). We have seen that this leads to almost no overall execution time increase as long as the work executed by those threads is not too small. To make a long story short, I'd suggest to change all dataflow(...) function calls in DGSWEM/HPX with dataflow(hpx::launch::fork, ...) and see if this helps the issue. It actually should - it would be nice to see whether theory and practice are actually the same for a change ;) HTH Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu _______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
