On 07/16/2012 09:16 AM, Philippe Muller wrote:
Hi RedHat & GlusterFS users,

Last week-end, I worked on a GlusterFS cluster upgrade, from 3.0.3 to 3.3.0. We were using hand-made volume files defining 2 volumes, a distributed one, and a replicated-distribute one; both using the "transport-type ib-verbs" option.

One of our objectives was to use the "gluster" CLI tool (which doesn't existed in 3.0.3 - from what I remember).

Here is what we did:
1 - Shutdown all glusterfs instances
2 - Install the Gluster 3.3.0
3 - Start glusterd on all hosts
4 - Create a trusted pool with all our hosts
5 - Create "compatible volumes" using the CLI tool; using the same bricks we were using with our hand-made volfiles and using the "rdma" transport (since ib-verbs was no longer an option...)
6 - Mount the volumes

Of course, we tested that scenario on VMs. No issues with data. We tested everything except.... RDMA !

When we finally made the upgrade, everything went fine, except mounting the volumes. We got this kind of error messages in the log files: "E [rdma.c:4458:tcp_connect_finish] 0-zodiac-client-3: tcp connect to failed (Connection refused)"
(notice the 2 white spaces between "connect to" and "failed")
That reminded me of an issue when we had a problem with the subnet manager 
running on the IB switch. But this time, the switch wasn't responsible; IPoIB 
was still running fine...



I scratched my head more than once, thinking about what I could possibly have 
forgotten. Then I searched for all information I could find about RDMA and 
3.3.0.

Here is what I found:
- On page 123 of the "GlusterFS Administration Guide 3.3.0", a small note saying: 
"NOTE: with 3.3.0 release, transport type 'rdma' and 'tcp,rdma' are not fully supported."


- On July 7, Ling Ho started a thread on this mailing-list, with very similar 
symptoms:http://www.mail-archive.com/[email protected]/msg09326.html  ; 
but he doesn't got any answer.



In the upgrade urgency, we weren't sure rollbacking to 3.0.3 was a good option 
(since we don't precisely known what XFS attributes were modified by 3.3.0 on 
the backend FS). So we switched to TCP (over IPoIB).


It's working. We are now running 3.3.0. But we are no longer taking advantage 
of RDMA.

So here are a few questions:
- Did I missed something that prevented me to use RDMA in 3.3.0 ?
- Is there a way to use RDMA in 3.3.0 ?


- Is there any official communication about the 3.3.0 RDMA issue ?
- Is there a 3.3.x release with RDMA support planned ? For when ?
- Will the RDMA transport be dropped in future releases ?

Thanks !
(and yeah, despite that issue, I still love GlusterFS :-)


Philippe Muller
I just came back from one week vacation. Yes, I didn't get any reply from the list, and were not able to get RDMA working when the server is configured for tcp,rdma. When I was doing testing, I had set up the server using rdma only and totally missed this.

I ended up using tcp with ipoverib. The performance is much better than tcp over 10G/s. However, since I am in a mix environment, and my I have to do some static routing on the gluster server. Basically routing the ipoverib subnet to the 10G/s subnet which the bricks are all set up with. Things have been working fine.

...
ling




_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to