Nathan,

Sorry, I’m moving and will not have the time to thoughtfully review your patch 
before March. I quickly glanced over and things look OK, but I had no time to 
validate/test it.

There is one minor thing I would suggest to change. In your patch 
in_unexpected_list is defined as a bool, which translates to an int on most 
platforms. You can change it to an uint8_t and move the in_unexpected_list 
field in the mca_pml_ob1_comm_proc_t to allow the compiler to pack it with the 
expected_sequence. The final structure should look like this:

struct mca_pml_ob1_comm_proc_t {
    opal_list_item_t super;        /**< allow this proc to be kept in a list */
    uint16_t expected_sequence;    /**< send message sequence number - receiver 
side */
    uint8_t  in_unexpected_list;   /**< the proc has pending unexpected 
messages */
    struct ompi_proc_t* ompi_proc;
#if OPAL_ENABLE_MULTI_THREADS
    volatile int32_t send_sequence; /**< send side sequence number */
#else
    int32_t send_sequence; /**< send side sequence number */
#endif
    opal_list_t frags_cant_match;  /**< out-of-order fragment queues */
    opal_list_t specific_receives; /**< queues of unmatched specific receives */
    opal_list_t unexpected_frags;  /**< unexpected fragment queues */
};

This will reduce the size of the mca_pml_ob1_comm_proc_t structure by 4 bytes. 
This structure is used to store information about each processor for each 
communicator, so the gain might be substantial at scale. Overall not a big win 
compared with the fact that you changed the super from an opal_object_t to an 
opal_list_item_t (doubling the amount of memory requested). But still better 
than nothing…

Thanks,
  George.



On Feb 19, 2014, at 00:56 , Nathan Hjelm <hje...@lanl.gov> wrote: 

> On Tue, Feb 11, 2014 at 01:43:37AM +0100, George Bosilca wrote:
>> 
>> The class is only usable in the context of a single .c file. As a code 
>> protection it makes perfect sense to me.
> 
> Ah, yes. So it is. Fixed in the latest patch.
> 
>> It’s not yet, and I did not notice an RFC about. The event I was referring 
>> to is only generated when the message is first noticed. In the particular 
>> instance affected by your patch it has been delayed until the communicator 
>> is created locally, but it still have to be generated once. 
> 
> The problem is the message is not generated once but twice with
> add_fragment_to_unexpected where it is. One message is generated when
> an out of order packet is processed by the outer loop (it is put into
> the out of order list) then another time when it is processed by the
> inter loop jumping to the add_fragment_to_unexpected. This has no affect
> on the iprobe optimization so I have dropped it from my proposed patch.
> 
>> The size check and the removal from the list is still in the critical path. 
>> At some point we were down to few hundreds of nano-sec, enough to get bugged 
>> by one extra memory reference.
> 
> I modified the patch to only remove procs from the unexpected_procs list
> when matching wildcard receive requests. This way there are no extra
> instructions in the critical path. It will make probe a little slower
> than the previous patch but that is ok. I see no degredation with simple
> pt2pt benchmarks with vader. Please take a look and let me know what you
> think.
> 
> -Nathan
> <iprobe_patch_v2.patch>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to