Ashley Pittman wrote:

Do you have a stack trace of your hung application to hand, in
particular when you say "All processes have made the same call to MPI_Allreduce. The processes are all in opal_progress, called (with intervening calls) by MPI_Allreduce."
do the intervening calls include mca_coll_sync_bcast
ompi_coll_tuned_barrier_intra_dec_fixed and
ompi_coll_tuned_barrier_intra_recursivedoubling?

I don't have a stack trace handy, and today is pretty full. I'll try and make some time to document what I've got in the next few days. I was able to hang a C translation of Ralph's reproducer as well.

        - Bryan

--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico

Reply via email to