Andy,

Sorry for the late reply.

> I've got a quick question on the creation of the dataflow graph in
> HPX. Zach reported yesterday that he can't run dgswem with a really
> large number of time steps on top of LGD/HPX, but the code works fine
> with just a couple of time steps. We create the complete dataflow
> graph before starting the simulation and my suspicion is that this
> simply fails for a couple million time steps. So my question is
> how/when should we fill the dataflow graph?
> 
> I assume we should have some kind of iterative procedure for that, so
> that we add a certain number of timesteps with a certain look ahead
> distance to the graph, but I'm not sure which parameters make sense.
> 
> For instance, if we assume a total of t timesteps, then right now our
> chunk size c is t and our look ahead distance l is 0. But we could
> also fill in 1000 timesteps up front and then add another step in each
> cycle (l = 1000, c = 1). Or some mixture of this, e.g. fill in 1000 up
> front and then add another 100 every 100 steps. I don't know. This
> already reeks of over engineering. Any ideas?

I think the solution lies not in defining an artificial limit on when to
stop generating the dependency tree. The problem is caused by the fact that
HPX (for historical - and now efficiency - reasons) uses the so called
'child-stealing' (or continuation-stealing) as its default mechanism. 

Continuation-stealing means that at the point when a new thread is created,
the system continues executing the current (parent) thread on the current
core, leaving the child (continuation) to be work-stolen by other cores.
Continuation-stealing is known to be prone to resource oversubscription.

On the other hand, parent-stealing means that the current core continues by
executing the newly created thread immediately, leaving the parent
(original) thread to be work-stolen by other cores. Parent-stealing has been
proven to allow to limit resource utilization as it prefers to execute work
instead of creating new work.

As said, HPX performs continuation stealing by default. We have implemented
parent stealing as well, but not as efficiently as you'd like to do as an
implementation of true parent stealing requires compiler support. In HPX you
can force parent-stealing by passing hpx::launch::fork as the first argument
(the launch policy) to async, dataflow, future::then, etc. The
implementation we have incurs one additional HPX-thread context switch per
new thread (~400ns). We have seen that this leads to almost no overall
execution time increase as long as the work executed by those threads is not
too small.

To make a long story short, I'd suggest to change all dataflow(...) function
calls in DGSWEM/HPX with dataflow(hpx::launch::fork, ...) and see if this
helps the issue. It actually should - it would be nice to see whether theory
and practice are actually the same for a change ;)

HTH
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu



_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to