Re: [deal.II] Providing read access to a distributed matrix to more than one thread on the same process

Kyle Schwiebert Fri, 24 May 2024 18:17:39 -0700

I do actually have one question here that may be relevant. Whenever I am 
checking things out in the gdb, it claims I have twice as many threads 
running as I asked for using MultithreadInfo::set_max_threads(). Is this 
possibly germane to my issue here and what is the cause of this? As is 
common my CPU has two logical cores per physical core, but the CPU 
utilization suggests that only one thread of each core is ever being used 
at any given time.


On Friday, May 24, 2024 at 7:28:05 PM UTC-4 Kyle Schwiebert wrote:

> Dr. Bangerth,
>
> Thank you very much for the help. My problem size is not so large as to 
> make copying the matrix impossible, but it is still undesirable and would 
> require a significant rewrite of my codes--and I plan to eventually scale 
> so this is not a long term solution.
>
> I also am not sure MPI_THREAD_MULTIPLE works anyway. I attempted an 
> alternative approach: Protecting all calls to objects with a shared 
> communicator by a mutex, and I ran into an odd issue: The 
> TrilinosWrappers::MPI::Vector::Vector() constructor seems to be causing 
> segfaults--I do not see how this could be the users' fault since there are 
> no input parameters. This happens even when I run with only one thread 
> unless I switch back to MPI_THREAD_SERIALIZED. I was able to verify that 
> the MPI implementation I have is able to provide MPI_THREAD_MULTIPLE. As I 
> am somewhat new to MPI I think I'm at the end of what I can try.
>
> Thank you again for you time in addressing my question.
>
> Regards,
> Kyle
>
> On Tuesday, May 21, 2024 at 11:42:12 AM UTC-4 Wolfgang Bangerth wrote:
>
>>
>> Kyle: 
>>
>> > I have a question to which I think the answer may be "no" but I thought 
>> I 
>> > would ask. I'll just ask and then explain the "why?" at the end in case 
>> there 
>> > is a better work around from the outset. 
>> > 
>> > I am initializing MPI myself with MPI_THREAD_MULTIPLE so that threads 
>> can each 
>> > call MPI functions without interfering. To the extent possible each 
>> thread has 
>> > its own copy of MPI_COMM_WORLD so that simultaneous calls do not get 
>> > convoluted. However, I have a matrix of type 
>> TrilinosWrappers::SparseMatrix 
>> > which BOTH threads need simultaneous access to. Since you must give one 
>> and 
>> > only one MPI_Comm object in the constructor, these sorts of conflicts 
>> are 
>> > inevitable. 
>> > 
>> > For obvious reasons I would not like to require a copy of this matrix 
>> for each 
>> > thread. The other obvious solution is a mutex on the matrix, but this 
>> could 
>> > easily get costly as both threads are calling matrix.vmult(...) in an 
>> > iterative solver. I thus have two questions: 
>> > 
>> > 1) Is initializing MPI with MPI_THREAD_MULTIPLE going to break the 
>> deal.ii 
>> > internals for some reason and I should just not investigate this 
>> further? 
>>
>> This should work. deal.II uses MPI_THREAD_SERIALIZED internally. 
>>
>>
>> > 2) I think the best solution, if possible, would be to get pointers to 
>> the 
>> > internal data of my matrix which I can then associate with different 
>> MPI_Comm 
>> > objects. Is this possible? 
>>
>> No. You should never try to use anything but the public interfaces of 
>> classes. 
>> Everything is bound to break things in unpredictable ways sooner or 
>> later. 
>> Probably sooner. 
>>
>>
>> > Why am I doing this? 
>> > This is a bit of a simplification, but imagine that I am solving a 
>> linear 
>> > deferred correction problem. This means at each time step I solve A . 
>> x_1 = 
>> > b_1 and A . x_2 = b_2. Let us assume that the matrix A does not have a 
>> > well-known preconditioner which scales nicely with the number of 
>> processors. 
>> > Then instead of using 2n processors on each linear system in series, we 
>> could 
>> > instead use n processors on each linear system simultaneously and 
>> expect this 
>> > to be faster. I hope this makes sense. 
>>
>> Yes, this makes sense. But you should not expect to be able to solve 
>> multiple 
>> linear systems at the same time over the same communicator. Each step in 
>> a 
>> linear solver (vector dot products, matrix-vector products, etc.) 
>> consists of 
>> multiple MPI messages where process wait for data to be sent from other 
>> processes. If you have multiple solves running on the same process, you 
>> will 
>> receive messages in unpredictable orders that may or may not belong to 
>> the 
>> current solve. Nothing good can come out of this. 
>>
>> But if the linear solve is the bottleneck, you can always build the 
>> matrix 
>> multiple times (or create copies), with different (sub-)communicators and 
>> run 
>> one solve on each communicator. 
>>
>> Best 
>> W. 
>>
>> -- 
>> ------------------------------------------------------------------------ 
>> Wolfgang Bangerth email: [email protected] 
>> www: http://www.math.colostate.edu/~bangerth/ 
>>
>>
>>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/7ff0db43-0f34-4c82-ac8b-2f3e89561f6fn%40googlegroups.com.

Re: [deal.II] Providing read access to a distributed matrix to more than one thread on the same process

Reply via email to