Seems like heavy FINODELK contention. As a diagnostic step, can you try disabling eager-locking and check the write performance again (gluster volume set $name cluster.eager-lock off)?
On Tue, Aug 5, 2014 at 11:44 AM, David F. Robinson < [email protected]> wrote: > Forgot to attach profile info in previous email. Attached... > > David > > > ------ Original Message ------ > From: "David F. Robinson" <[email protected]> > To: [email protected] > Sent: 8/5/2014 2:41:34 PM > Subject: Fw: Re: Corvid gluster testing > > > I have been testing some of the fixes that Pranith incorporated into the > 3.5.2-beta to see how they performed for moderate levels of i/o. All of the > stability issues that I had seen in previous versions seem to have been > fixed in 3.5.2; however, there still seem to be some significant > performance issues. Pranith suggested that I send this to the > gluster-devel email list, so here goes: > > I am running an MPI job that saves a restart file to the gluster file > system. When I use the following in my fstab to mount the gluster volume, > the i/o time for the 2.5GB file is roughly 45-seconds. > > > * gfsib01a.corvidtec.com:/homegfs /homegfs glusterfs > transport=tcp,_netdev 0 0* > When I switch this to use the NFS protocol (see below), the i/o time is > 2.5-seconds. > > * gfsib01a.corvidtec.com:/homegfs /homegfs nfs > vers=3,intr,bg,rsize=32768,wsize=32768 0 0* > > The read-times for gluster are 10-20% faster than NFS, but the write times > are almost 20x slower. > > I am running SL 6.4 and glusterfs-3.5.2-0.1.beta1.el6.x86_64... > > > > > > > > > > > > > *[root@gfs01a glusterfs]# gluster volume info homegfsVolume Name: > homegfsType: Distributed-ReplicateVolume ID: > 1e32672a-f1b7-4b58-ba94-58c085e59071Status: StartedNumber of Bricks: 2 x 2 > = 4Transport-type: tcpBricks:Brick1: > gfsib01a.corvidtec.com:/data/brick01a/homegfsBrick2: > gfsib01b.corvidtec.com:/data/brick01b/homegfsBrick3: > gfsib01a.corvidtec.com:/data/brick02a/homegfsBrick4: > gfsib01b.corvidtec.com:/data/brick02b/homegfs* > > David > > ------ Forwarded Message ------ > From: "Pranith Kumar Karampuri" <[email protected]> > To: "David Robinson" <[email protected]> > Cc: "Young Thomas" <[email protected]> > Sent: 8/5/2014 2:25:38 AM > Subject: Re: Corvid gluster testing > > [email protected] is the email-id for the mailing list. We > should probably start with the initial run numbers and the comparison for > glusterfs mount and nfs mounts. May be something like > > glusterfs mount: 90 minutes > nfs mount: 25 minutes > > And profile outputs, volume config, number of mounts, hardware > configuration should be a good start. > > Pranith > > On 08/05/2014 09:28 AM, David Robinson wrote: > > Thanks pranith > > > =============================== > David F. Robinson, Ph.D. > President - Corvid Technologies > 704.799.6944 x101 [office] > 704.252.1310 [cell] > 704.799.7974 [fax] > [email protected] > http://www.corvidtechnologies.com > > > On Aug 4, 2014, at 11:22 PM, Pranith Kumar Karampuri <[email protected]> > wrote: > > > > On 08/05/2014 08:33 AM, Pranith Kumar Karampuri wrote: > > On 08/05/2014 08:29 AM, David F. Robinson wrote: > > On 08/05/2014 12:51 AM, David F. Robinson wrote: > No. I don't want to use nfs. It eliminates most of the benefits of why I > want to use gluster. Failover redundancy of the pair, load balancing, etc. > > What is the meaning of 'Failover redundancy of the pair, load balancing ' > Could you elaborate more? smb/nfs/glusterfs are just access protocols that > gluster supports functionality is almost same > > Here is my understanding. Please correct me where I am wrong. > > With gluster, if I am doing a write and one of the replicated pairs goes > down, there is no interruption to the I/o. The failover is handled by > gluster and the fuse client. This isn't done if I use an nfs mount unless > the component of the pair that goes down isn't the one I used for the > mount. > > With nfs, I will have to mount one of the bricks. So, if I have gfs01a, > gfs01b, gfs02a, gfs02b, gfs03a, gfs03b, etc and my fstab mounts gfs01a, it > is my understanding that all of my I/o will go through gfs01a which then > gets distributed to all of the other bricks. Gfs01a throughput becomes a > bottleneck. Where if I do a gluster mount using fuse, the load balancing is > handled at the client side , not the server side. If I have 1000-nodes > accessing 20-gluster bricks, I need the load balancing aspect. I cannot > have all traffic going through the network interface on a single brick. > > If I am wrong with the above assumptions, I guess my question is why would > one ever use the gluster mount instead of nfs and/or samba? > > Tom: feel free to chime in if I have missed anything. > > I see your point now. Yes the gluster server where you did the mount is > kind of a bottle neck. > > Now that we established the problem is in the clients/protocols, you > should send out a detailed mail on gluster-devel and see if anyone can help > with you on performance xlators that can improve it a bit more. My area of > expertise is more on replication. I am sub-maintainer for replication,locks > components. I also know connection management/io-threads related issues > which lead to hangs as I worked on them before. Performance xlators are > black box to me. > > Performance xlators are enabled only on fuse gluster stack. On nfs server > mounts we disable all the performance xlators except write-behind as nfs > client does lots of things for improving performance. I suggest you guys > follow up more on gluster-devel. > > Appreciate all the help you did for improving the product :-). Thanks a > ton! > Pranith > > Pranith > > David (Sent from mobile) > > =============================== > David F. Robinson, Ph.D. > President - Corvid Technologies > 704.799.6944 x101 [office] > 704.252.1310 [cell] > 704.799.7974 [fax] > [email protected] > http://www.corvidtechnologies.com > > > > > _______________________________________________ > Gluster-devel mailing list > [email protected] > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > >
_______________________________________________ Gluster-devel mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-devel
