Hi Corey, 

Let me share the results of testing that i've been doing for the past 5 weeks 
or so. As in your experience, the results are no where near to what i've been 
expecting. What a disappointment. Anyway, here we go. 

I am using Centos 6.3 with the latest updates and patches using the latest 
QLogic OFED version 1.5.3.x; Qlogic drivers with OFED 1.5.4.x is not compiling 
on Centos 6.3. I've also tried vanilla OFED 1.5.4.1 and Mellanox OFED 1.5.3.x 
with pretty much similar results. I've been testing Glusterfs 3.2.7 and 3.3.0. 
No significant performance difference between 3.2 and 3.3 brunch. 

My hardware is 1 storage server with Mellanox QDR dual port card. Two server 
nodes with Qlogic dual port mezzanine cards and QLogic QDR (HP) blade switch. 
Storage server uses ZFS made of 4 stripes of 2 disks mirror + 240gb SSD for ZIL 
+ 240gb SSD for L2ARC cache. I also enabled compression + deduplication. 
Underlying ZFS performance using iozone tests ( iozone -+u -t 2 -F f1 f2 -r 
2048 -s 30G) is between 4GB/s and 10GB/s depending on the test levels. 
Infiniband fabric tests using rdma were giving between 3 and 4 GB/s. Please 
note GigaBytes NOT GigaBits per second. 

So, I was expecting to have a throughput of around 2.5 - 3 GB/s over glusterfs 
rdma taking into account overheads. yeah, right, wishful thinking it was!!! 


I've built my PoC environment and started testing with just one client and i've 
been getting around 400-600mb/s tops. Writes were about 20% faster than reads. 
Following some performance tuning on the glusterfs and zfs side I've managed to 
increase throughput to around 700-800mb/s with writes still being about 20% 
faster. To note that adding the "-o" switch to the iozone command to use the 
synchronised writes the writes throughput was limited to the ZIL SSD speed. 

While trying to figure out the cause of the bottleneck i've realised that the 
bottle neck is coming from the client side as running concurrent test from two 
clients would give me about 650mb/s per each client. Doing a bit more research 
it seems that the cause of the problem is with FUSE. googleing for this issue 
i've found a number of people complaining the limit of fuse throughput at 
around 600-700mb/s. There is a kernel patch to address this issue, but the 
results of testing from several people showed only a marginal increase in 
performance. Guys managed to increase their throughput from around 600mb/s to 
about 850mb/s or so. Thus, from what i've read, it's currently not being 
possible to achieve speeds over 1GB/s with fuse. This made me wonder the reason 
behind choosing to use fuse in the first place for the client side glusterfs. 

P.S. If you are looking to use glusterfs as the backend storage for the kvm 
virtualisation, I would warn you that it's a tricky business. I've managed to 
make things work, but the performance is far worse than any of my pessimistic 
expectations! An example - a mounted glusterfs-rdma file system on the server 
running kvm would give me around 700-850mb/s throughput. I was only getting 
50mb/s max when doing the test from the vm stored on that partition. In 
comparison, nfs would give me around 350-400mb/s. I have never expected 
glustefs to perform worse than nfs. 

I would be grateful if anyone would share their experience with glusterfs over 
infiniband and their tips on improving performance. 

cheers 

Andrei 


----- Original Message -----

From: "Corey Kovacs" <[email protected]> 
To: [email protected] 
Sent: Friday, 7 September, 2012 2:45:48 PM 
Subject: [Gluster-users] Throughout over infiniband 

Folks, 

I finally got my hands on a 4x FDR (56Gb) Infiniband switch and 4 cards to do 
some testing of GlusterFS over that interface. 

So far, I am not getting the throughput I _think_ I should see. 

My config is made up of.. 

4 dl360-g8's (three bricks and one client) 
4 4xFDR, dual port IB cards (one port configured in each card per host) 
1 4xFDR 36 port Mellanox Switch (managed and configured) 
GlusterFS 3.2.6 
RHEL6.3 

I have tested the IB cards and get about 6GB between hosts over raw IB. Using 
ipoib, I can get about 22Gb/sec. Not too shabby for a first go but I expected 
more (cards are in connected mode with MTU of 64k). 

My raw speed to the disks (though the buffer cache... I just realized I've not 
tested direct mode IO, I'll do that later today) is about 800MB/sec. I expect 
to see on the order of 2GB/sec (a little less than 3x800). 

When I write a large stream using dd, and watch the bricks I/O I see ~800MB/sec 
on each one, but at the end of the test, the report from dd indicates 
800MB/sec. 

Am I missing something fundamental? 

Any pointers would be appreciated, 


Thanks! 


Corey 


_______________________________________________ 
Gluster-users mailing list 
[email protected] 
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users 

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to