Hi,

First, I'm glad to say my MOSIX component is working and giving good initial result. Thanks for all your help! I'm not sure how (I know I should fill in some license agreement docs), but I would like to contribute the code to the Open-MPI project. Is there an official code-review process? anything else other then test it on some machines and commit it if/when I get the permissions?

Second, I have a question about In-place send buffers. My mca_btl_mosix_prepare_src() currently works like this:

mca_btl_base_descriptor_t*
mca_btl_mosix_prepare_src(struct mca_btl_base_module_t* btl,
                          struct mca_btl_base_endpoint_t* endpoint,
struct mca_mpool_base_registration_t* registration,
                          struct opal_convertor_t* convertor,
                          uint8_t order,
                          size_t reserve,
                          size_t* size,
                          uint32_t flags)
{
    mca_btl_mosix_frag_t* frag;
    struct iovec iov;
    uint32_t iov_count = 1;
    size_t result;
    int rc;

    /* Enforce upper message length limit */
    if( OPAL_UNLIKELY((reserve + *size) > btl->btl_max_send_size) ) {
        *size = btl->btl_max_send_size - reserve;
    }

    /* Fetch a fragment to work on */
    if( *size + reserve <= btl->btl_eager_limit ) {
        MCA_BTL_MOSIX_FRAG_ALLOC_EAGER(frag, rc);
    } else {
        MCA_BTL_MOSIX_FRAG_ALLOC_MAX(frag, rc);
    }
    if( OPAL_UNLIKELY(NULL == frag) ) {
        return NULL;
    }
    frag->segments[0].seg_addr.pval = (void*)(frag + 1);
    frag->segments[0].seg_len = reserve;

    /* Fill it with outgoing data */
    iov.iov_len = frag->size - reserve;
/**************** if( opal_convertor_need_buffers(convertor) ) { ****************/
    if( 0 != reserve ) {
        /* Use existing buffer at the end of the fragment */
iov.iov_base = (unsigned char*)frag->segments[0].seg_addr.pval + reserve;
        rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
        if( 0 > rc ) {
            MCA_BTL_MOSIX_FRAG_RETURN(frag);
            return NULL;
        }
        frag->segments[0].seg_len += result;
        frag->base.des_src_cnt = 1;
    } else {
        iov.iov_base = NULL;
        /* Read the iovec for the buffer to be transfered */
        rc = opal_convertor_pack( convertor, &iov, &iov_count, &result );
        if( rc < 0 ) {
            MCA_BTL_MOSIX_FRAG_RETURN(frag);
            return NULL;
        }
        frag->segments[1].seg_addr.pval = iov.iov_base;
        frag->segments[1].seg_len = result;
        frag->base.des_src_cnt = 2;
    }
    frag->base.des_src = frag->segments;
    frag->base.order = MCA_BTL_NO_ORDER;
    frag->base.des_dst = NULL;
    frag->base.des_dst_cnt = 0;
    frag->base.des_flags = flags;
    return &frag->base;
}

- Notice that the condition line on the convertor I tried to copy from the TCP equivalent is commented out. If I switch the condition I get:

[singularity:3774] *** An error occurred in MPI_Barrier
[singularity:3774] *** reported by process [3220963329,0]
[singularity:3774] *** on communicator MPI_COMM_WORLD
[singularity:3774] *** MPI_ERR_TRUNCATE: message truncated
[singularity:3774] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[singularity:3774] ***    and potentially your MPI job)
[singularity:03773] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal [singularity:03773] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
alex@singularity:~/huji/benchmarks/simple$

I understand that at the moment the buffer sent by the user is copied to (void*)(frag+1) even if it would be best for it to be left in its place, with the reserved data at frag->segments[0] and the user buffer at frag->segments[1]. Does anyone have an idea as to what would cause that? Maybe a problem on the receiver-side function?

Thanks,
Alex

P.S. I know this problem happens with 8-byte messages but 4-byte pass OK. I don't know if it helps.

Reply via email to