On 07/16/2012 09:16 AM, Philippe Muller wrote:
Hi RedHat & GlusterFS users,
Last week-end, I worked on a GlusterFS cluster upgrade, from 3.0.3 to
3.3.0.
We were using hand-made volume files defining 2 volumes, a distributed
one, and a replicated-distribute one; both using the "transport-type
ib-verbs" option.
One of our objectives was to use the "gluster" CLI tool (which doesn't
existed in 3.0.3 - from what I remember).
Here is what we did:
1 - Shutdown all glusterfs instances
2 - Install the Gluster 3.3.0
3 - Start glusterd on all hosts
4 - Create a trusted pool with all our hosts
5 - Create "compatible volumes" using the CLI tool; using the same
bricks we were using with our hand-made volfiles and using the "rdma"
transport (since ib-verbs was no longer an option...)
6 - Mount the volumes
Of course, we tested that scenario on VMs. No issues with data. We
tested everything except.... RDMA !
When we finally made the upgrade, everything went fine, except
mounting the volumes. We got this kind of error messages in the log files:
"E [rdma.c:4458:tcp_connect_finish] 0-zodiac-client-3: tcp connect to
failed (Connection refused)"
(notice the 2 white spaces between "connect to" and "failed")
That reminded me of an issue when we had a problem with the subnet manager
running on the IB switch. But this time, the switch wasn't responsible; IPoIB
was still running fine...
I scratched my head more than once, thinking about what I could possibly have
forgotten. Then I searched for all information I could find about RDMA and
3.3.0.
Here is what I found:
- On page 123 of the "GlusterFS Administration Guide 3.3.0", a small note saying:
"NOTE: with 3.3.0 release, transport type 'rdma' and 'tcp,rdma' are not fully supported."
- On July 7, Ling Ho started a thread on this mailing-list, with very similar
symptoms:http://www.mail-archive.com/[email protected]/msg09326.html ;
but he doesn't got any answer.
In the upgrade urgency, we weren't sure rollbacking to 3.0.3 was a good option
(since we don't precisely known what XFS attributes were modified by 3.3.0 on
the backend FS). So we switched to TCP (over IPoIB).
It's working. We are now running 3.3.0. But we are no longer taking advantage
of RDMA.
So here are a few questions:
- Did I missed something that prevented me to use RDMA in 3.3.0 ?
- Is there a way to use RDMA in 3.3.0 ?
- Is there any official communication about the 3.3.0 RDMA issue ?
- Is there a 3.3.x release with RDMA support planned ? For when ?
- Will the RDMA transport be dropped in future releases ?
Thanks !
(and yeah, despite that issue, I still love GlusterFS :-)
Philippe Muller
I just came back from one week vacation. Yes, I didn't get any reply
from the list, and were not able to get RDMA working when the server is
configured for tcp,rdma. When I was doing testing, I had set up the
server using rdma only and totally missed this.
I ended up using tcp with ipoverib. The performance is much better than
tcp over 10G/s. However, since I am in a mix environment, and my I have
to do some static routing on the gluster server. Basically routing the
ipoverib subnet to the 10G/s subnet which the bricks are all set up
with. Things have been working fine.
...
ling
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users