Hey,

> in our application, we observe that memory consumption is highly dependent
> on the supplied OS-Threads.
> Two otherwise identical runs of the application with "--hpx:threads 4" and
> "--hpx:threads 8" consume 40GB and above 90GB RAM respectively
> (reproducable on different hardware).
> It could be possible, that the additional threads alter the execution
> order and provoke a race condition that leaks crazy amounts of memory.
> But is there a possible explanation for the memory consumption within HPX
> itself?
>
> Regarding the application:
> We schedule a large number of tasks (approximately 16
> million) that have no dependencies on each other from a simple for loop.
> Some of them might schedule a continuing task when done, resulting in
> approximately 30 million tasks in total.
> Most data should be held on the heap, required stack sizes of tasks should
> not exceed a few hundred bytes. We do not supply a configuration or any
> command line parameters beyond hpx:threads. The application is run on a
> single node, without remote calls of any kind.

This could be explained by the fact, that all cores have their own
thread-local allocators that hold on to allocated stack segments and reuse
those as needed. In general, stack segments are created only if needed, but
will be reused as needed if previous threads have been terminated.

So, if you use one core, then only a few threads are actually run and
allocate stack segments. If you use more cores, then each of those will
allocate stack segments for the threads that have started running...

Given the large number of HPX threads you run, the high memory requirements
make sense to me.

A possible solution for you would be to limit the amount of threads that are
created (thus limit the number of created stack segments). We have several
options to do that:

- using John's limiting_executor, which is a special executor that holds
back scheduling tasks once a given limit is reached and automatically
resumes creating tasks once tasks have run to completion
- using the sliding_semaphore, which is more difficult to explain, but in
essence does a very similar thing (see the 1d_stencil examples to see how it
works)
- come up with your own means to limit the number of tasks to be created
simultaneously

HTH
Regards Hartmut
---------------
http://stellar.cct.lsu.edu
https://github.com/STEllAR-GROUP/hpx




_______________________________________________
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu, stellar-group.org
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to