Re: [Gluster-users] rdma problems with glusterfs 3.1.0

Lana Deere Thu, 28 Oct 2010 09:51:12 -0700

Michael,

I was having at least a similar symptom to the "Transport endpoint is
not connected" message you list, and in my case it was because I was
using a version of ofed which wasn't good enough.  When I started
using ofed 1.5.1 then that problem went away.


You might look at the archives for a thread "hanging "df" (3.1,
infiniband)" from Oct 19th which contains the record of diagnosis and
repair, in case it offers you any help.

.. Lana ([email protected])






On Thu, Oct 28, 2010 at 11:26 AM, Michael Galloway
<[email protected]> wrote:
> Good day all,
>
> I’ve built a new glusterfs volume using 20 nodes of one of my clusters, each
> with a 2TB SATA disk, formatted with ext3 (system is centos 5.2, x86_64).
> The volume is such:
>
> Volume Name: gfsvol1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 10 x 2 = 20
> Transport-type: rdma
> Bricks:
> Brick1: node002:/gfs
> Brick2: node003:/gfs
> Brick3: node004:/gfs
> Brick4: node005:/gfs
> Brick5: node006:/gfs
> Brick6: node007:/gfs
> Brick7: node008:/gfs
> Brick8: node009:/gfs
> Brick9: node010:/gfs
> Brick10: node011:/gfs
> Brick11: node012:/gfs
> Brick12: node013:/gfs
> Brick13: node014:/gfs
> Brick14: node015:/gfs
> Brick15: node016:/gfs
> Brick16: node017:/gfs
> Brick17: node019:/gfs
> Brick18: node020:/gfs
> Brick19: node021:/gfs
> Brick20: node022:/gfs
>
> The volume mounts on a client:
>
> [r...@moldyn ~]# mount -t glusterfs -o transport=rdma node002:/gfsvol1
> /gfsvol1
> [r...@moldyn ~]# df
> Filesystem           1K-blocks      Used Available Use% Mounted on
> glusterfs#node002:/gfsvol1
>                     19228583424   2001664 18249825792   1% /gfsvol1
>
> I get this error on a copy into the gluster volume:
>
> [...@moldyn ~]$ cp -R pmemd/ /gfsvol1/mgx/pmemd
> cp: writing `/gfsvol1/mgx/pmemd/fmdrun.out': Transport endpoint is not
> connected
> cp: closing `/gfsvol1/mgx/pmemd/fmdrun.out': Resource temporarily
> unavailable
>
> it did copy files, just failed on that one:
>
> /gfsvol1/mgx/pmemd/:
> total 4357376
> -rw-rw-r-- 1 root root  514711552 Oct 27 13:02 fmdrun.out
> -rw-rw-r-- 1 mgx  mgx        4754 Oct 27 13:01 fmdrun.out.new
> -rw-rw-r-- 1 mgx  mgx   851832631 Oct 27 13:03 fmdrun.out_run1
> -rw-rw-r-- 1 mgx  mgx          81 Oct 27 13:01 mdinfo
> -rw------- 1 mgx  mgx         803 Oct 27 13:02 md.out
> -rw-rw-r-- 1 mgx  mgx         342 Oct 27 13:03 md.sub
> -rw-rw-r-- 1 mgx  mgx  1567835776 Oct 27 13:02 new.mdcrd
> -rw-rw-r-- 1 mgx  mgx  1522326100 Oct 27 13:01 new.mdcrd_run1
> -rw-rw-r-- 1 mgx  mgx      155957 Oct 27 13:02 new.rst
> -rw-rw-r-- 1 mgx  mgx      155957 Oct 27 13:01 old.rst
> drwxrwxr-x 3 mgx  mgx       40960 Oct 27 13:01 rbenew
> -rw-rw-r-- 1 mgx  mgx        1008 Oct 27 13:03 vp_mdrun.in
> -rw-rw-r-- 1 mgx  mgx       26190 Oct 27 13:01 vp.prmtop
> -rw-rw-r-- 1 mgx  mgx      348092 Oct 27 13:01 vp_wat.prmtop
>
> pmemd/:
> total 4711216
> -rw-rw-r-- 1 mgx mgx  876818259 Apr  2  2010 fmdrun.out
> -rw-rw-r-- 1 mgx mgx       4754 Mar 19  2010 fmdrun.out.new
> -rw-rw-r-- 1 mgx mgx  851832631 Mar  6  2010 fmdrun.out_run1
> -rw-rw-r-- 1 mgx mgx         81 Apr  2  2010 mdinfo
> -rw------- 1 mgx mgx        803 Apr  2  2010 md.out
> -rw-rw-r-- 1 mgx mgx        342 Mar 31  2010 md.sub
> -rw-rw-r-- 1 mgx mgx 1567835776 Apr  2  2010 new.mdcrd
> -rw-rw-r-- 1 mgx mgx 1522326100 Mar  6  2010 new.mdcrd_run1
> -rw-rw-r-- 1 mgx mgx     155957 Apr  2  2010 new.rst
> -rw-rw-r-- 1 mgx mgx     155957 Mar  9  2010 old.rst
> drwxrwxr-x 3 mgx mgx       4096 Mar 31  2010 rbenew
> -rw-rw-r-- 1 mgx mgx       1008 Mar  2  2010 vp_mdrun.in
> -rw-rw-r-- 1 mgx mgx      26190 Mar  2  2010 vp.prmtop
> -rw-rw-r-- 1 mgx mgx     348092 Mar  2  2010 vp_wat.prmtop
>
> The fmdrun.out file is truncated and incorrect ownership.
>
> The volume was created following the 3.1 docu.
>
> Where is the problem at? Gluster? IB? my ib is ofed 1.3.1 and I have SDR
> mellenox HCA’s.
>
> --- michael
>
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] rdma problems with glusterfs 3.1.0

Reply via email to