Nathan,
Sorry, I’m moving and will not have the time to thoughtfully review your patch
before March. I quickly glanced over and things look OK, but I had no time to
validate/test it.
There is one minor thing I would suggest to change. In your patch
in_unexpected_list is defined as a bool, which translates to an int on most
platforms. You can change it to an uint8_t and move the in_unexpected_list
field in the mca_pml_ob1_comm_proc_t to allow the compiler to pack it with the
expected_sequence. The final structure should look like this:
struct mca_pml_ob1_comm_proc_t {
opal_list_item_t super; /**< allow this proc to be kept in a list */
uint16_t expected_sequence; /**< send message sequence number - receiver
side */
uint8_t in_unexpected_list; /**< the proc has pending unexpected
messages */
struct ompi_proc_t* ompi_proc;
#if OPAL_ENABLE_MULTI_THREADS
volatile int32_t send_sequence; /**< send side sequence number */
#else
int32_t send_sequence; /**< send side sequence number */
#endif
opal_list_t frags_cant_match; /**< out-of-order fragment queues */
opal_list_t specific_receives; /**< queues of unmatched specific receives */
opal_list_t unexpected_frags; /**< unexpected fragment queues */
};
This will reduce the size of the mca_pml_ob1_comm_proc_t structure by 4 bytes.
This structure is used to store information about each processor for each
communicator, so the gain might be substantial at scale. Overall not a big win
compared with the fact that you changed the super from an opal_object_t to an
opal_list_item_t (doubling the amount of memory requested). But still better
than nothing…
Thanks,
George.
On Feb 19, 2014, at 00:56 , Nathan Hjelm <[email protected]> wrote:
> On Tue, Feb 11, 2014 at 01:43:37AM +0100, George Bosilca wrote:
>>
>> The class is only usable in the context of a single .c file. As a code
>> protection it makes perfect sense to me.
>
> Ah, yes. So it is. Fixed in the latest patch.
>
>> It’s not yet, and I did not notice an RFC about. The event I was referring
>> to is only generated when the message is first noticed. In the particular
>> instance affected by your patch it has been delayed until the communicator
>> is created locally, but it still have to be generated once.
>
> The problem is the message is not generated once but twice with
> add_fragment_to_unexpected where it is. One message is generated when
> an out of order packet is processed by the outer loop (it is put into
> the out of order list) then another time when it is processed by the
> inter loop jumping to the add_fragment_to_unexpected. This has no affect
> on the iprobe optimization so I have dropped it from my proposed patch.
>
>> The size check and the removal from the list is still in the critical path.
>> At some point we were down to few hundreds of nano-sec, enough to get bugged
>> by one extra memory reference.
>
> I modified the patch to only remove procs from the unexpected_procs list
> when matching wildcard receive requests. This way there are no extra
> instructions in the critical path. It will make probe a little slower
> than the previous patch but that is ok. I see no degredation with simple
> pt2pt benchmarks with vader. Please take a look and let me know what you
> think.
>
> -Nathan
> <iprobe_patch_v2.patch>_______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel