I am testing Infiniband for the first time. It seems that I should be able to 
get a lot more speed than I am with some pretty basic tests. Maybe someone 
running Infiniband can confirm that what I am seeing is way out of line, and/or 
help diagnose? 

I have two systems connected using 3.1.2qa3. With 3.1.1 infiniband wouldn't 
even start, it gave an error about unable to intialize rdma. But with the 
latest version and an upgrade to OFED 1.5.2, everything starts up with no 
errors and I can create a volume and mount it. 

The underlying Infiniband seems ok, and a basic ibv_rc_pingpong test shows I 
can move data pretty fast:
81920000 bytes in 0.23 seconds = 2858.45 Mbit/sec
10000 iters in 0.23 seconds = 22.93 usec/iter

So now I have two volumes created, one that uses tcp over a gig-e link and one 
that uses rdma. I mount them and do some file copy tests... And they are almost 
exactly the same? What? 

gluster volume info

Volume Name: test2_volume
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: bravo:/cluster/shadow/test2
Brick2: backup:/cluster/shadow/test2

Volume Name: test_volume
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: rdma
Bricks:
Brick1: bravo:/cluster/shadow/test
Brick2: backup:/cluster/shadow/test

mount:
glusterfs#localhost:/test_volume on /mnt/test type fuse 
(rw,allow_other,default_permissions,max_read=131072)
glusterfs#localhost:/test2_volume on /mnt/test2 type fuse 
(rw,allow_other,default_permissions,max_read=131072)


time cp files.tar /mnt/test2/

real    0m11.159s
user    0m0.123s
sys     0m1.244s

files.tar is single file, 390MB, so this about 35MB/s. Fine for gig-e. 
----------------------------

time cp files.tar /mnt/test/

real    0m5.656s
user    0m0.116s
sys     0m0.962s

69MB/s... ehhh. Faster at least. On a few runs, this was not any faster at all. 
Maybe a cache effect? 
----------------------------

time cp -av /usr/src/kernels /mnt/test2/
real    0m49.605s
user    0m0.681s
sys     0m2.593s

kernels dir is 34MB of small files. The low latency of IB should really show an 
improvement here I thought. 
-----------------------------

time cp -av /usr/src/kernels /mnt/test/

real    0m56.046s
user    0m0.625s
sys     0m2.675s

It took LONGER? That can't be right. 
------------------------------

And finally, this error is appearing in the rdma mount log every 3 seconds on 
both nodes:

[2011-01-10 19:46:56.728127] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:46:59.738291] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:02.748260] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:05.758256] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:08.768299] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:11.778308] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:14.788356] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:17.798381] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)
[2011-01-10 19:47:20.808413] E [rdma.c:4428:tcp_connect_finish] 
test_volume-client-1: tcp connect to  failed (Connection refused)

But there are no restrictions in the config. Everything is allow *. So my 
questions are, can anyone else tell me what kind of basic file copy performance 
they see using IB? And what can I do to troubleshoot?

Thanks List and Devs, 

Chris
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to