Re: [deal.II] Providing read access to a distributed matrix to more than one thread on the same process

Kyle Schwiebert Mon, 22 Jul 2024 11:27:00 -0700

I wanted to follow up on this since I was able to find a solution which 
works for me. Hopefully it will help someone :)

Thank you very much for the help.

1) As suggested by Dr. Bangerth above, MPI_THREAD_MULTIPLE can work fine 
with deal.ii if you are willing to handle the initialization of the 
libraries and MPI yourself.
2) Also as suggested above, a typical use of serial mutex, parallel mutex, 
or a combination thereof will not do the trick.
3) However, while not extensively tested, I have run into no issues with 
something like the below. The idea is that in a parallel mutex, we do not 
have control over which thread gets access to communicator which must be 
shared, only that only one thread on each process gets access. The key is 
to put the root process in charge of which work group/thread id on each 
process gets released. Put more simply: Utilities::MPI::CollectiveMutex 
lets one thread through. My custom mutex lets a specific thread through.

I was a little surprised to see that neither the below way of solving the 
problem nor moving the MPI_THREAD_MULTIPLE caused a performance hit and 
indeed represents a sizable improvement over MPI only, but I am not 
currently scaling to a large machine so this could change.

void my_fun() // Function which will be called by several threads on each 
process.
{
int rank;
int thread_id = 0; // Your method of getting a thread id/work group here. 
Iteration number, sub-step, or what have you.
MPI_Comm_rank(communicators[thread_id], &rank); // communicators[thread_id] 
must hold a duplicate of MPI_COMM_WORLD
std::unique_lock<std::mutex> lock;
if (rank == 0){
    std::unique_lock<std::mutex> lock_(my_mutex); // my_mutex is a 
std::mutex which must be accessible to all threads which might enter the 
current function on each MPI process.
    MPI_Barrier(communicators[thread_id]);
    lock.swap(lock_);
}
else{
    MPI_Barrier(communicators[thread_id]);
}
// Call to Trilinos or PETSc function (like a matrix mult) which is 
communicated over MPI_COMM_WORLD here.
// Now exit function or otherwise release lock.
}

4) The bug I was having was unrelated to any concurrency issues actually 
and was instead due to mistakenly renumbering DoFs component wise in 
parallel computations.
5) I have no idea either why twice as many threads as asked for are being 
spawned, but as suggested only half of them are ever taking up compute 
resources so the impact is probably minimal. I am not using any external 
(to deal.II) libraries and I'm strictly using the dealii::threads functions 
for launching threads. This happens even running the tutorial steps in 
serial.
6) One quick comment, in debug mode, deal.II throws an assert regarding 
mixing threads and MPI with PETSc. However, as I do not actually intend to 
do multithreaded reading/writing to/of the PETSc data structures and am 
using threads for extra parallelism on top of the FEM parallelism so this 
is a bit inconvenient. But I understand my use case is uncommon.

Thank you for all the help.

Regards,
Kyle

On Friday, May 24, 2024 at 11:23:53 PM UTC-4 Wolfgang Bangerth wrote:

> On 5/24/24 19:17, Kyle Schwiebert wrote:
> > **
> > 
> > I do actually have one question here that may be relevant. Whenever I am 
> > checking things out in the gdb, it claims I have twice as many threads 
> running 
> > as I asked for using MultithreadInfo::set_max_threads(). Is this 
> possibly 
> > germane to my issue here and what is the cause of this? As is common my 
> CPU 
> > has two logical cores per physical core, but the CPU utilization 
> suggests that 
> > only one thread of each core is ever being used at any given time.
>
> I don't have a good answer for this. It is possible that you are linking 
> (directly or indirectly) with a library that is build with OpenMP which 
> creates its own set of worker threads, and then deal.II uses TBB which 
> also 
> creates its own set of worker threads. In practice, you will likely only 
> ever 
> see one or the other of these worker threads being active.
>
> Best
> W.
>
> -- 
> ------------------------------------------------------------------------
> Wolfgang Bangerth email: [email protected]
> www: http://www.math.colostate.edu/~bangerth/
>
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/5252baa8-d9d4-4c51-abfd-edfe13180585n%40googlegroups.com.

Re: [deal.II] Providing read access to a distributed matrix to more than one thread on the same process

Reply via email to