I wanted to follow up on this since I was able to find a solution which
works for me. Hopefully it will help someone :)
Thank you very much for the help.
1) As suggested by Dr. Bangerth above, MPI_THREAD_MULTIPLE can work fine
with deal.ii if you are willing to handle the initialization of the
libraries and MPI yourself.
2) Also as suggested above, a typical use of serial mutex, parallel mutex,
or a combination thereof will not do the trick.
3) However, while not extensively tested, I have run into no issues with
something like the below. The idea is that in a parallel mutex, we do not
have control over which thread gets access to communicator which must be
shared, only that only one thread on each process gets access. The key is
to put the root process in charge of which work group/thread id on each
process gets released. Put more simply: Utilities::MPI::CollectiveMutex
lets one thread through. My custom mutex lets a specific thread through.
I was a little surprised to see that neither the below way of solving the
problem nor moving the MPI_THREAD_MULTIPLE caused a performance hit and
indeed represents a sizable improvement over MPI only, but I am not
currently scaling to a large machine so this could change.
void my_fun() // Function which will be called by several threads on each
process.
{
int rank;
int thread_id = 0; // Your method of getting a thread id/work group here.
Iteration number, sub-step, or what have you.
MPI_Comm_rank(communicators[thread_id], &rank); // communicators[thread_id]
must hold a duplicate of MPI_COMM_WORLD
std::unique_lock<std::mutex> lock;
if (rank == 0){
std::unique_lock<std::mutex> lock_(my_mutex); // my_mutex is a
std::mutex which must be accessible to all threads which might enter the
current function on each MPI process.
MPI_Barrier(communicators[thread_id]);
lock.swap(lock_);
}
else{
MPI_Barrier(communicators[thread_id]);
}
// Call to Trilinos or PETSc function (like a matrix mult) which is
communicated over MPI_COMM_WORLD here.
// Now exit function or otherwise release lock.
}
4) The bug I was having was unrelated to any concurrency issues actually
and was instead due to mistakenly renumbering DoFs component wise in
parallel computations.
5) I have no idea either why twice as many threads as asked for are being
spawned, but as suggested only half of them are ever taking up compute
resources so the impact is probably minimal. I am not using any external
(to deal.II) libraries and I'm strictly using the dealii::threads functions
for launching threads. This happens even running the tutorial steps in
serial.
6) One quick comment, in debug mode, deal.II throws an assert regarding
mixing threads and MPI with PETSc. However, as I do not actually intend to
do multithreaded reading/writing to/of the PETSc data structures and am
using threads for extra parallelism on top of the FEM parallelism so this
is a bit inconvenient. But I understand my use case is uncommon.
Thank you for all the help.
Regards,
Kyle
On Friday, May 24, 2024 at 11:23:53 PM UTC-4 Wolfgang Bangerth wrote:
> On 5/24/24 19:17, Kyle Schwiebert wrote:
> > **
> >
> > I do actually have one question here that may be relevant. Whenever I am
> > checking things out in the gdb, it claims I have twice as many threads
> running
> > as I asked for using MultithreadInfo::set_max_threads(). Is this
> possibly
> > germane to my issue here and what is the cause of this? As is common my
> CPU
> > has two logical cores per physical core, but the CPU utilization
> suggests that
> > only one thread of each core is ever being used at any given time.
>
> I don't have a good answer for this. It is possible that you are linking
> (directly or indirectly) with a library that is build with OpenMP which
> creates its own set of worker threads, and then deal.II uses TBB which
> also
> creates its own set of worker threads. In practice, you will likely only
> ever
> see one or the other of these worker threads being active.
>
> Best
> W.
>
> --
> ------------------------------------------------------------------------
> Wolfgang Bangerth email: [email protected]
> www: http://www.math.colostate.edu/~bangerth/
>
>
>
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/dealii/5252baa8-d9d4-4c51-abfd-edfe13180585n%40googlegroups.com.