Rainer Keller wrote:
Hi Terry,
On Wednesday 22 August 2007 16:22, Terry D. Dontje wrote:
I thought I would run this by the group before trying to unravel the
code and figure out how to fix the problem. It looks to me from some
experiementation that when a process matches an unexpected message that
the PERUSE framework incorrectly fires a
PERUSE_COMM_MSG_MATCH_POSTED_REQ in addition to a
PERUSE_COMM_REQ_MATCH_UNEX event. I believe this is wrong that the
former event should not be fired in this case.
You are right, the former event PERUSE_COMM_MSG_MATCH_POSTED_Q should not be
posted, as this was an unexpected message.
If the above assumption is true I think the problem arises because
PERUSE_COMM_MSG_MATCH_POSTED_REQ event is fired in function
mca_pml_ob1_recv_request_progress which is called by
mca_pml_ob1_recv_request_match_specific when a match of an unexpected
message has occurred. I am wondering if the
PERUSE_COMM_MSG_MATCH_POSTED_REQ event should be moved to a more posted
queue centric routine something like mca_pml_ob1_recv_frag_match?
I believe, this is correct -- at least this works for a large message late
sender and late receiver test program mpi_peruse.c.
Should be fixed with the committed patch v15947.
Actually, there are two other items, one is a missing
PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q...
This works for large posted messges but when the posted message is small
you don't see the unexpected messages at all now.
--td
Additionally, we have a problem that we fire PERUSE_COMM_REQ_ACTIVATE event
for MPI_*Probe-function calls. The solution is to move
the pml_base_sendreq.h / pml_base_recv_req.h
into
pml_ob1_irecv.c, and resp. pml_ob1_isend.c
Please see the v15945.
With best regards,
Rainer