Michael, I was having at least a similar symptom to the "Transport endpoint is not connected" message you list, and in my case it was because I was using a version of ofed which wasn't good enough. When I started using ofed 1.5.1 then that problem went away.
You might look at the archives for a thread "hanging "df" (3.1, infiniband)" from Oct 19th which contains the record of diagnosis and repair, in case it offers you any help. .. Lana ([email protected]) On Thu, Oct 28, 2010 at 11:26 AM, Michael Galloway <[email protected]> wrote: > Good day all, > > I’ve built a new glusterfs volume using 20 nodes of one of my clusters, each > with a 2TB SATA disk, formatted with ext3 (system is centos 5.2, x86_64). > The volume is such: > > Volume Name: gfsvol1 > Type: Distributed-Replicate > Status: Started > Number of Bricks: 10 x 2 = 20 > Transport-type: rdma > Bricks: > Brick1: node002:/gfs > Brick2: node003:/gfs > Brick3: node004:/gfs > Brick4: node005:/gfs > Brick5: node006:/gfs > Brick6: node007:/gfs > Brick7: node008:/gfs > Brick8: node009:/gfs > Brick9: node010:/gfs > Brick10: node011:/gfs > Brick11: node012:/gfs > Brick12: node013:/gfs > Brick13: node014:/gfs > Brick14: node015:/gfs > Brick15: node016:/gfs > Brick16: node017:/gfs > Brick17: node019:/gfs > Brick18: node020:/gfs > Brick19: node021:/gfs > Brick20: node022:/gfs > > The volume mounts on a client: > > [r...@moldyn ~]# mount -t glusterfs -o transport=rdma node002:/gfsvol1 > /gfsvol1 > [r...@moldyn ~]# df > Filesystem 1K-blocks Used Available Use% Mounted on > glusterfs#node002:/gfsvol1 > 19228583424 2001664 18249825792 1% /gfsvol1 > > I get this error on a copy into the gluster volume: > > [...@moldyn ~]$ cp -R pmemd/ /gfsvol1/mgx/pmemd > cp: writing `/gfsvol1/mgx/pmemd/fmdrun.out': Transport endpoint is not > connected > cp: closing `/gfsvol1/mgx/pmemd/fmdrun.out': Resource temporarily > unavailable > > it did copy files, just failed on that one: > > /gfsvol1/mgx/pmemd/: > total 4357376 > -rw-rw-r-- 1 root root 514711552 Oct 27 13:02 fmdrun.out > -rw-rw-r-- 1 mgx mgx 4754 Oct 27 13:01 fmdrun.out.new > -rw-rw-r-- 1 mgx mgx 851832631 Oct 27 13:03 fmdrun.out_run1 > -rw-rw-r-- 1 mgx mgx 81 Oct 27 13:01 mdinfo > -rw------- 1 mgx mgx 803 Oct 27 13:02 md.out > -rw-rw-r-- 1 mgx mgx 342 Oct 27 13:03 md.sub > -rw-rw-r-- 1 mgx mgx 1567835776 Oct 27 13:02 new.mdcrd > -rw-rw-r-- 1 mgx mgx 1522326100 Oct 27 13:01 new.mdcrd_run1 > -rw-rw-r-- 1 mgx mgx 155957 Oct 27 13:02 new.rst > -rw-rw-r-- 1 mgx mgx 155957 Oct 27 13:01 old.rst > drwxrwxr-x 3 mgx mgx 40960 Oct 27 13:01 rbenew > -rw-rw-r-- 1 mgx mgx 1008 Oct 27 13:03 vp_mdrun.in > -rw-rw-r-- 1 mgx mgx 26190 Oct 27 13:01 vp.prmtop > -rw-rw-r-- 1 mgx mgx 348092 Oct 27 13:01 vp_wat.prmtop > > pmemd/: > total 4711216 > -rw-rw-r-- 1 mgx mgx 876818259 Apr 2 2010 fmdrun.out > -rw-rw-r-- 1 mgx mgx 4754 Mar 19 2010 fmdrun.out.new > -rw-rw-r-- 1 mgx mgx 851832631 Mar 6 2010 fmdrun.out_run1 > -rw-rw-r-- 1 mgx mgx 81 Apr 2 2010 mdinfo > -rw------- 1 mgx mgx 803 Apr 2 2010 md.out > -rw-rw-r-- 1 mgx mgx 342 Mar 31 2010 md.sub > -rw-rw-r-- 1 mgx mgx 1567835776 Apr 2 2010 new.mdcrd > -rw-rw-r-- 1 mgx mgx 1522326100 Mar 6 2010 new.mdcrd_run1 > -rw-rw-r-- 1 mgx mgx 155957 Apr 2 2010 new.rst > -rw-rw-r-- 1 mgx mgx 155957 Mar 9 2010 old.rst > drwxrwxr-x 3 mgx mgx 4096 Mar 31 2010 rbenew > -rw-rw-r-- 1 mgx mgx 1008 Mar 2 2010 vp_mdrun.in > -rw-rw-r-- 1 mgx mgx 26190 Mar 2 2010 vp.prmtop > -rw-rw-r-- 1 mgx mgx 348092 Mar 2 2010 vp_wat.prmtop > > The fmdrun.out file is truncated and incorrect ownership. > > The volume was created following the 3.1 docu. > > Where is the problem at? Gluster? IB? my ib is ofed 1.3.1 and I have SDR > mellenox HCA’s. > > --- michael > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > _______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
