Folks, I found (at least) two issues with oshmem put if btl/vader is used with knem enabled :
$ oshrun -np 2 --mca btl vader,self ./oshmem_max_reduction -------------------------------------------------------------------------- SHMEM_ABORT was invoked on rank 0 (pid 11936, host=soleil) with errorcode -1. -------------------------------------------------------------------------- [soleil.iferc.local:11934] 1 more process has sent help message help-shmem-api.txt / shmem-abort [soleil.iferc.local:11934] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages the error message is not helpful at all ... the abort happens in the vader btl in mca_btl_vader_put_knem if (OPAL_UNLIKELY(0 != ioctl (mca_btl_vader.knem_fd, KNEM_CMD_INLINE_COPY, &icopy))) { return OPAL_ERROR; } ioctl fails with EACCES the root cause is the symmetric memory was "prepared" with vader_prepare_src that uses knem_cr.protection = PROT_READ; a trivial workaround (probably not good for production) is to knem_cr.protection = PROT_READ|PROT_WRITE; then we run into the second issue : in mca_btl_vader_put_knem : icopy.remote_offset = 0; and this is clearly not what we want ... in my environment, we want to put to 0x0600df0, so the remote_offset should be 0xdf0 since the symmetric memory was "prepared" starting at 0x0600000 i do not think the vader btl is to be blamed here ... i'd rather think yoda way to use the btl is not correct (but only for put with vader btl when knem is used) i can get the test program run correctly by manually setting icopy.remote_offset with a debugger. please note i fixed a typo in the vader btl so make sure you update the master. in the mean time, what about forcing put_via_send to 1 in mca_spml_yoda_put_internal ? /* an other option is to unset the MCA_BTL_FLAGS_PUT flag in the vader btl if knem is used, but i do not believe this is a vader issue */ Cheers, Gilles