Hi,
i'm experiencing the following problem with MPICH-G2, when i run a
mutli-node job "Isend" operation doesn't run to completion, giving the
following error message:
ERROR: MPID_Get_count: could not interpret status->private_count 145538180
0 - MPI_GET_COUNT : Internal MPI error!
Aborting with code 16
MPICH-G2: read failure - globus_xio: System error in read: Connection
reset by peer, state=await_format
MPICH-G2: read failure - globus_xio: System error in read: Connection
reset by peer, state=await_format
MPICH-G2: read failure - globus_xio: System error in read: Connection
reset by peer, state=await_format
MPICH-G2: read failure - globus_xio: System error in read: Connection
reset by peer, state=await_format
ERROR: MPID_Abort: failed remote globus_gram_client_job_cancel to job
contact >https://cs2.cse.oar.net:51767/14800/1189625960/<
Caught broken pipe signal. Connection to server may be down
MPICH-G2: ERROR: prime_the_line: connect failed
MPICH-G2: read failure - globus_xio: System error in read: Connection
reset by peer, state=await_format
ERROR: MPID_Abort: failed remote globus_gram_client_job_cancel to job
contact >https://cs2.cse.oar.net:51767/14800/1189625960/<
ERROR: MPID_Abort: failed remote globus_gram_client_job_cancel to job
contact >https://cs2.cse.oar.net:51767/14800/1189625960/<
ERROR: MPID_Abort: failed remote globus_gram_client_job_cancel to job
contact >https://cs3.cse.oar.net:55840/14766/1189625960/<
ERROR: MPID_Abort: failed remote globus_gram_client_job_cancel to job
contact >https://cs4.cse.oar.net:59909/3581/1189625960/<
i've searched the internet for a couple of days, and found that some
other people have posted this problem, but not one person has posted a
solution. I'm using globus 4.0.4 and mpich 1.2.7.
Does anyone have any clues on what could be causing this?
thanks in advance for your help,
~leo