2008/9/17 Rui Machado <[EMAIL PROTECTED]>:
> From: Rui Machado <[EMAIL PROTECTED]>
> Date: 2008/9/17
> Subject: Re: [ofa-general] atomic operations on ppc64
> To: Dotan Barak <[EMAIL PROTECTED]>
>
>
> 2008/9/17 Dotan Barak <[EMAIL PROTECTED]>:
>> On Wed, Sep 17, 2008 at 5:54 PM, Rui Machado <[EMAIL PROTECTED]> wrote:
>>> 2008/9/17 Dotan Barak <[EMAIL PROTECTED]>:
>>>> On Wed, Sep 17, 2008 at 5:44 PM, Rui Machado <[EMAIL PROTECTED]> wrote:
>>>>> 2008/9/17 Dotan Barak <[EMAIL PROTECTED]>:
>>>>>> On Wed, Sep 17, 2008 at 5:28 PM, Rui Machado <[EMAIL PROTECTED]> wrote:
>>>>>>> Hey Dotan,
>>>>>>>
>>>>>>> 2008/9/17 Dotan Barak <[EMAIL PROTECTED]>:
>>>>>>>> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado <[EMAIL PROTECTED]> wrote:
>>>>>>>>> Hi list,
>>>>>>>>>
>>>>>>>>> does anyone have experienced problems using IB atomic operations
>>>>>>>>> (fetch and add) on a ppc64 platform?
>>>>>>>>> I tried a small example (using fetch and add) on x86 and ppc64 and on
>>>>>>>>> x86 worked fine while on ppc64 didn't.
>>>>>>>>
>>>>>>>> Do you handle the ntoh/hton or do you let the driver/HCA deal with it 
>>>>>>>> by itself?
>>>>>>>
>>>>>>> Nop, I don't use those. I guess then I'm letting the driver/HCA deal 
>>>>>>> with it....
>>>>>>
>>>>>> Do you see endianess issues or completely corrupted data?
>>>>>>
>>>>>
>>>>> Just to make it clear (to me :) ). I'm talking about ppc64<-->ppc64
>>>>> communication.
>>>>> Should I still concern with converting data because of endianess?
>>>>> What happens is that I ask for a fetch and add and it doesn't happen.
>>>>> The value on the server doesn't get modified.
>>>>
>>>> This is a weird behaviour indeed ..
>>>>
>>>> Can you post the code in your program that fill the SR?
>>>>
>>>> Dotan
>>>>
>>>
>>> Not sure what do you mean by SR.
>>> Here's is the function inc() which I call to increment 1 one the
>>> remote machine. The remote machine has its buffer full of zeroes.
>>> That's what the client gets all the time although I increment 3 times
>>> in a row (with a sleep in between)
>>>
>>> Is this enough?
>>> Thanks for the help
>>>
>>> void inc()
>>> {
>>>
>>>        struct ibv_qp_attr check_attr;
>>>        struct ibv_qp_init_attr check_init_attr;
>>>
>>>        void *ev_ctx;
>>>
>>>        struct ibv_send_wr *bad_wr;
>>>        struct ibv_wc wc;
>>>        struct ibv_sge slist;
>>>        struct ibv_send_wr swr3;
>>>
>>>
>>>        slist.addr = (uintptr_t)buffer;
>>>        slist.length = 8;
>>>        slist.lkey =mr->lkey;
>>>
>>>        swr3.wr.atomic.remote_addr = remote_node->mi.bufAddr;
>>>        swr3.wr.atomic.rkey = remote_node->mi.buf_rkey;
>>>        swr3.wr.atomic.compare_add = 1;
>>>
>>>        swr3.wr_id      = 1;
>>>        swr3.sg_list    = &slist;
>>>        swr3.num_sge    = 1;
>>>        swr3.opcode     = IBV_WR_ATOMIC_FETCH_AND_ADD;
>>>        swr3.send_flags = IBV_SEND_SIGNALED;
>>>        swr3.next       = NULL;
>>>
>>>
>>>        if(ibv_post_send(qp,&swr3,&bad_wr)){
>>>                printf("Couldn't post send...\n");
>>>                return 0;
>>>        }
>>>
>>>
>>>        int ne=0;
>>>        do{
>>>                ne = ibv_poll_cq(cq,1,&wc);
>>>        }while(ne==0);
>>>
>>>        if((ne < 0) || (wc.status != IBV_WC_SUCCESS)){
>>>
>>>                //check qp status
>>>                
>>> if(!ibv_query_qp(qp,&check_attr,IBV_QP_STATE,&check_init_attr))
>>>                        printf("The qp state is: %d\n ",check_attr.qp_state);
>>>
>>>        }
>>> }
>>>
>>
>> The code looks good and it should work...
>> (I would have memset every structure before using it ..)
>>
>>
>> Did you check the memory in the sender side or in the reciver side?
>>
>
> As I mentioned it does work on x86.
>
> Actually on both:
>
> server:
> Initial counter at buffer is 0
> counter at buffer is 0
> counter at buffer is 0
> counter at buffer is 0
> counter at buffer is 0
> counter at buffer is 0
> counter at buffer is 0
> counter at buffer is 0
>
>
> client:
> initial IB atomic counter 0
> IB atomic counter 0
> IB atomic counter 0
> IB atomic counter 0
>
> What could this be related to? Driver, HW?
>

Anyone with some insight on this?
Maybe how can I debug this further?

Cheers
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to