On Mittwoch, 22. Februar 2017 22:02:57 CET Tim Biedert wrote: > > We were having troubles when running with multiple localities up until > > recently. If you update to today's top of master, at least the problems > > coming from HPX directly should go away. > > Hi, thanks for the feedback! Unfortunately, the deadlock remains. > Any tips how to debug that? (Admittedly, I haven't figured out yet how > to interactively attach gdb to a running Slurm multi-node batch job.)
Please also make sure you are not affected by this: https://github.com/STEllAR-GROUP/hpx/issues/2517 In order to check, run your code with -Ihpx.parcel.message_handlers=0 I got bit by this only at a scale of 256 recently. If you are running with at least 256 localities (or are using hpx::lcos::barrier with more than 256 participating threads), please be aware of: https://github.com/STEllAR-GROUP/hpx/issues/2516 > > Interestingly, the deadlock has never occurred when simulating multiple > localities on a single machine by starting multiple processes using > mpirun -np. So it's probably a race condition influenced by additional > network latency on the cluster, I guess. > > However, I was wondering: Are there any known issues which might cause > a (remote) action invocation to stall/deadlock? I don't know if > that's any special, but let's say we have a few hundred (or thousand) > components per locality, which can all communicate > wildly/asynchronously. Are there any HPX implementation/concurrency > limits we might reach? > > > > On a side note: When running on a single node, sometimes I get the > following error for specific extreme cases in my benchmarks: > > {what}: mmap() failed to allocate thread stack due to insufficient > resources, increase /proc/sys/vm/max_map_count or add > -Ihpx.stacks.use_guard_pages=0 to the command line: HPX(unhandled_exception) > > I assume it means out of memory. I'm just wondering, because usually > Slurm kills my job if I exceed the reserved memory amount. > > > Best, > Tim _______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
