Hi again, I have a question regarding parallelization using both MPI and threads in a NUMA setup. I would like to be able to use one MPI process per socket or NUMA node and use threads within each MPI process to use all the cores. Can I use hwloc to put and keep all of these in the right places?
Thanks, Ondrej