Hi,
First, I'm glad to say my MOSIX component is working and giving good
initial result. Thanks for all your help!
I'm not sure how (I know I should fill in some license agreement docs),
but I would like to contribute the code to the Open-MPI project.
Is there an official code-review process? anything else other then test
it on some machines and commit it if/when I get the permissions?
Second, I have a question about In-place send buffers. My
mca_btl_mosix_prepare_src() currently works like this:
mca_btl_base_descriptor_t*
mca_btl_mosix_prepare_src(struct mca_btl_base_module_t* btl,
struct mca_btl_base_endpoint_t* endpoint,
struct mca_mpool_base_registration_t*
registration,
struct opal_convertor_t* convertor,
uint8_t order,
size_t reserve,
size_t* size,
uint32_t flags)
{
mca_btl_mosix_frag_t* frag;
struct iovec iov;
uint32_t iov_count = 1;
size_t result;
int rc;
/* Enforce upper message length limit */
if( OPAL_UNLIKELY((reserve + *size) > btl->btl_max_send_size) ) {
*size = btl->btl_max_send_size - reserve;
}
/* Fetch a fragment to work on */
if( *size + reserve <= btl->btl_eager_limit ) {
MCA_BTL_MOSIX_FRAG_ALLOC_EAGER(frag, rc);
} else {
MCA_BTL_MOSIX_FRAG_ALLOC_MAX(frag, rc);
}
if( OPAL_UNLIKELY(NULL == frag) ) {
return NULL;
}
frag->segments[0].seg_addr.pval = (void*)(frag + 1);
frag->segments[0].seg_len = reserve;
/* Fill it with outgoing data */
iov.iov_len = frag->size - reserve;
/**************** if( opal_convertor_need_buffers(convertor) ) {
****************/
if( 0 != reserve ) {
/* Use existing buffer at the end of the fragment */
iov.iov_base = (unsigned char*)frag->segments[0].seg_addr.pval
+ reserve;
rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
if( 0 > rc ) {
MCA_BTL_MOSIX_FRAG_RETURN(frag);
return NULL;
}
frag->segments[0].seg_len += result;
frag->base.des_src_cnt = 1;
} else {
iov.iov_base = NULL;
/* Read the iovec for the buffer to be transfered */
rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
if( rc < 0 ) {
MCA_BTL_MOSIX_FRAG_RETURN(frag);
return NULL;
}
frag->segments[1].seg_addr.pval = iov.iov_base;
frag->segments[1].seg_len = result;
frag->base.des_src_cnt = 2;
}
frag->base.des_src = frag->segments;
frag->base.order = MCA_BTL_NO_ORDER;
frag->base.des_dst = NULL;
frag->base.des_dst_cnt = 0;
frag->base.des_flags = flags;
return &frag->base;
}
- Notice that the condition line on the convertor I tried to copy from
the TCP equivalent is commented out. If I switch the condition I get:
[singularity:3774] *** An error occurred in MPI_Barrier
[singularity:3774] *** reported by process [3220963329,0]
[singularity:3774] *** on communicator MPI_COMM_WORLD
[singularity:3774] *** MPI_ERR_TRUNCATE: message truncated
[singularity:3774] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[singularity:3774] *** and potentially your MPI job)
[singularity:03773] 1 more process has sent help message
help-mpi-errors.txt / mpi_errors_are_fatal
[singularity:03773] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
alex@singularity:~/huji/benchmarks/simple$
I understand that at the moment the buffer sent by the user is copied to
(void*)(frag+1) even if it would be best for it to be left in its place,
with the reserved data at frag->segments[0] and the user buffer at
frag->segments[1]. Does anyone have an idea as to what would cause that?
Maybe a problem on the receiver-side function?
Thanks,
Alex
P.S. I know this problem happens with 8-byte messages but 4-byte pass
OK. I don't know if it helps.