After some digging Terry and I discovered the problem with r26626. To perform 
an rdma transaction pmls used to explicitly promote the seg_addr from 
prepare_src/dst to 64-bits before sending it over the wire. The other end would 
then (inconsistently) use the lval to perform the get/put. Segments are now 
opaque objects so the pmls simply memcpy the segments into the rdma header 
(without promoting seg_addr). So, right now we have a mixture of lvals and 
pvals in the put and get paths which will not work in two cases: 32-bit bit, 
and mixed 32/64-bit environments.

I can think of a few ways to fix this:

 - Require the pmls to explicitly promote seg_addr to 64-bits after the memcpy. 
This is a band aid fix but I can implement/commit it very quickly (this will 
work fine until a more permanent solution is found).
 - Require prepare_src/dst to return segments with 64-bit addresses for all 
rdma fragments (0 == reserve). This is relatively simple for most btls but a 
little more complicated for openib. The openib btl may pack data for a get/put 
into a send segment. The obvious way to handle this case is to set the lval in 
prepare_src and restore the pval when the send fragment is returned.
 - Change the btl interface in a way that allows the btl to prepare segments 
specifically to be sent to another machine. This is a bit more complicated and 
would require lots of discussion and an RFC.

I am open to suggestions.

-Nathan

Reply via email to