Hey, > in our application, we observe that memory consumption is highly dependent > on the supplied OS-Threads. > Two otherwise identical runs of the application with "--hpx:threads 4" and > "--hpx:threads 8" consume 40GB and above 90GB RAM respectively > (reproducable on different hardware). > It could be possible, that the additional threads alter the execution > order and provoke a race condition that leaks crazy amounts of memory. > But is there a possible explanation for the memory consumption within HPX > itself? > > Regarding the application: > We schedule a large number of tasks (approximately 16 > million) that have no dependencies on each other from a simple for loop. > Some of them might schedule a continuing task when done, resulting in > approximately 30 million tasks in total. > Most data should be held on the heap, required stack sizes of tasks should > not exceed a few hundred bytes. We do not supply a configuration or any > command line parameters beyond hpx:threads. The application is run on a > single node, without remote calls of any kind.
This could be explained by the fact, that all cores have their own thread-local allocators that hold on to allocated stack segments and reuse those as needed. In general, stack segments are created only if needed, but will be reused as needed if previous threads have been terminated. So, if you use one core, then only a few threads are actually run and allocate stack segments. If you use more cores, then each of those will allocate stack segments for the threads that have started running... Given the large number of HPX threads you run, the high memory requirements make sense to me. A possible solution for you would be to limit the amount of threads that are created (thus limit the number of created stack segments). We have several options to do that: - using John's limiting_executor, which is a special executor that holds back scheduling tasks once a given limit is reached and automatically resumes creating tasks once tasks have run to completion - using the sliding_semaphore, which is more difficult to explain, but in essence does a very similar thing (see the 1d_stencil examples to see how it works) - come up with your own means to limit the number of tasks to be created simultaneously HTH Regards Hartmut --------------- http://stellar.cct.lsu.edu https://github.com/STEllAR-GROUP/hpx _______________________________________________ hpx-users mailing list hpx-users@stellar.cct.lsu.edu, stellar-group.org https://mail.cct.lsu.edu/mailman/listinfo/hpx-users