Hi,

While running below ib_rdma_bw on 32bit platform, I am getting unexpected low 
throughput.
Server: ib_rdma_bw -p 5019 -s 1048576 -t 500 -n 5000 -b -c Client: ib_rdma_bw -p 5019 -s 1048576 -t 500 -n 5000 -b -c 100.168.54.49

(If iterations are changed to 500, I am getting expected throughput)

Looking at the code I found,
ib_rdma_bw.c in perftest package has following code
{
       double cycles_to_units;
       unsigned long tsize;    /* Transferred size, in megabytes */
       ....
       ....
       cycles_to_units = get_cpu_mhz(0) * 1000000;

       printf("%d: Bandwidth average: %g MB/sec\n", pid,
                        tsize * iters * cycles_to_units /
(tcompleted[iters - 1] - tposted[0]) / 0x100000);
}


Here, tsize is "unsigned long" and which is of 4Bytes on 32bit platforms and 8Bytes on 64bit platforms. I run test for 1M datasize and 5000 iterations as above, the calculation (tsize * iters) overflows "unsigned long" limit and thus gives unexpected result as low throughput.

Correct fix should be applied in ib_rdma_bw application. Either change calculation from (tsize * iters * cycles_to_units) to ( cycles_to_units * tsize * iters ) Or to change tsize to double.
Should I go ahead and submit a patch ?

Viral Mehta, Embedded Software Engineer, www.einfochips.com

P.S. - However, I do understand that we can overflow double boundary as well if we run test for higher datasize and higher iterations. Better way to calculate bandwidth would be after every fix number of iterations (say 100).


_______________________________________________
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Reply via email to