Dear Martin,

my local machine is dying to a Valgrind run at the moment, but as soon as 
that is done with one step I will put these changes in right away and post 
the results here (<6 hrs).
>From what I make of the call stacks on process somehow gets out of the 
SameAs() call without being MPI-blocked, and the others are then forced to 
wait during the All_Reduce call. How or where that happens I will try to 
figure out later today. SDM is now working well in my eclipse setup and I 
hope to be able to track the problem.

Best,
Pascal

Am Donnerstag, 16. März 2017 08:58:53 UTC+1 schrieb Martin Kronbichler:
>
> Dear Pascal,
>
> You are right, in your case one needs to call 
>
> GrowingVectorMemory<TrilinosWrappers::MPI::BlockVector>::release_unused_memory()
> rather than for the vector. Can you try that as well?
>
> The problem appears to be that the call to SameAs returns different 
> results for different processors, which it should not be, which is why I 
> suspect that there might be some stale communicator object around. Another 
> indication for that assumption is that you get stuck in the initialization 
> of the temporary vectors of the GMRES solver, which is exactly this kind of 
> situation.
>
> As to the particular patch I referred to: It does release some memory that 
> might have stale information but it also changes some of the call 
> structures slightly. Could you try to change the following:
>
> if (vector->Map().SameAs(v.vector->Map()) == false)
>
> to 
>
> if (v.vector->Map().SameAs(vector 
> <https://www.dealii.org/8.4.0/doxygen/deal.II/classTrilinosWrappers_1_1VectorBase.html#afa80df228813b5bd94a6e780a4f5e6ae>->Map())
>  
> == false)
>
> Best, Martin 
> On 16.03.2017 01:28, Pascal Kraft wrote: 
>
> Hi Martin,
> that didn't solve my problem. What I have done in the meantime is replace 
> the check in line 247 of trilinos_vector.cc with true. I don't know if this 
> causes memory leaks or anything but my code seems to be working fine with 
> that change. 
> To your suggestion: Would I have also had to call the templated version 
> for BlockVectors or only for Vectors? I only tried the latter. Would I have 
> had to also apply some patch to my dealii library for it to work or is the 
> patch you talked about simply that you included the functionality of the 
> call 
> GrowingVectorMemory<TrilinosWrappers::MPI::Vector>::release_unused_memory() 
> in some places?
> I have also wanted to try MPICH instead of OpenMPI because of a post about 
> an internal error in OpenMPI and one of the functions appearing in the call 
> stacks sometimes not blocking properly.
> Thank you for your time and your fast responses - the whole library and 
> the people developing it and making it available are simply awesome ;)
> Pascal
> Am Mittwoch, 15. März 2017 17:26:23 UTC+1 schrieb Martin Kronbichler:
>>
>> Dear Pascal,
>>
>> This problem seems related to a problem we recently worked around in 
>> https://github.com/dealii/dealii/pull/4043
>>
>> Can you check what happens if you call 
>> GrowingVectorMemory<TrilinosWrappers::MPI::Vector>::release_unused_memory()
>>
>> between your optimization steps? If a communicator gets stack in those 
>> places it is likely a stale object somewhere that we fail to work around 
>> for some reason.
>>
>> Best, Martin 
>> On 15.03.2017 14:10, Pascal Kraft wrote: 
>>
>> Dear Timo, 
>> I have done some more digging and found out the following. The problems 
>> seem to happen in trilinos_vector.cc between the lines 240 and 270.
>> What I see on the call stacks is, that one process reaches line 261 
>> ( ierr = vector->GlobalAssemble (last_action); ) and then waits inside this 
>> call at an MPI_Barrier with the following stack:
>> 20 <symbol is not available> 7fffd4d18f56 
>> 19 opal_progress()  7fffdc56dfca 
>> 18 ompi_request_default_wait_all()  7fffddd54b15 
>> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d 
>> 16 PMPI_Barrier()  7fffddd68a9c 
>> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f 
>> 14 Epetra_MpiDistributor::Do()  7fffe4089773 
>> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a 
>> 12 Epetra_DistObject::Export()  7fffe400b7b7 
>> 11 int Epetra_FEVector::GlobalAssemble<int>()  7fffe4023d7f 
>> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3 
>> The other (in my case three) processes are stuck in the head of the 
>> if/else-f statement leading up to this point, namely in the line 
>> if (vector->Map().SameAs(v.vector 
>> <https://www.dealii.org/8.4.0/doxygen/deal.II/classTrilinosWrappers_1_1VectorBase.html#afa80df228813b5bd94a6e780a4f5e6ae>->Map())
>>  
>> == false) 
>> inside the call to SameAs(...) with stacks like
>> 15 opal_progress() 7fffdc56dfbc 14 ompi_request_default_wait_all() 
>> 7fffddd54b15 13 ompi_coll_tuned_allreduce_intra_recursivedoubling() 
>> 7fffcf9a4913 12 PMPI_Allreduce() 7fffddd6587f 11 Epetra_MpiComm::MinAll() 
>> 7fffe408739e 10 Epetra_BlockMap::SameAs() 7fffe3fb9d74 
>> Maybe this helps. Producing a smaller example will likely not be possible 
>> in the coming two weeks but if there are no solutions until then I can try.
>> Greetings,
>> Pascal
>> -- The deal.II project is located at http://www.dealii.org/ For mailing 
>> list/forum options, see https://groups.google.com/d/forum/dealii?hl=en 
>> --- You received this message because you are subscribed to the Google 
>> Groups "deal.II User Group" group. To unsubscribe from this group and stop 
>> receiving emails from it, send an email to dealii+un...@googlegroups.com. 
>> For more options, visit https://groups.google.com/d/optout. 
>>
>> -- The deal.II project is located at http://www.dealii.org/ For mailing 
> list/forum options, see https://groups.google.com/d/forum/dealii?hl=en 
> --- You received this message because you are subscribed to the Google 
> Groups "deal.II User Group" group. To unsubscribe from this group and stop 
> receiving emails from it, send an email to dealii+un...@googlegroups.com 
> <javascript:>. For more options, visit https://groups.google.com/d/optout. 
>
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to