Re: [Gluster-devel] Fw: Re: Corvid gluster testing

David F. Robinson Thu, 07 Aug 2014 06:11:14 -0700

Just to clarify a little, there are two cases where I was evaluatingperformance.

1) The first case that Pranith was working involved 20-nodes with4-processors on each node for a total of 80-processors. Each processordoes its own independent i/o. These files are roughly 100-200MB eachand there are several hundred of them. When I mounted the glustersystem using fuse, it took 1.5-hours to do the i/o. When I mounted thesame system using NFS, it took 30-minutes. Note, that in order to getthe gluster mounted file-system down to 1.5-hours, I had to get rid ofthe replicated volume (this was done during troubleshooting with Pranithto rule out other possible issues). The timing was significantly worse(3+ hours) when I was using a replicated pair.2) The second case was the output of a larger single file (roughly2.5TB). For this case, it takes the gluster mounted filesystem60-seconds (although I got that down to 52-seconds with some glusterparameter tuning). The NFS mount takes 38-seconds. I sent the resultsof this to the developer list first as this case is much easier to test(50-seconds versus what could be 3+ hours).

I am head out of town for a few days and will not be able to doadditional testing until Monday. For the second case, I will turn offcluster.eager-lock and send the results to the email list. If there isany other testing that you would like to see for the first case, let meknow and I will be happy to perform the tests and send in the results...


Sorry for the confusion...

David


------ Original Message ------
From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
To: "Anand Avati" <av...@gluster.org>

Cc: "David F. Robinson" <david.robin...@corvidtec.com>; "Gluster Devel"<gluster-devel@gluster.org>

Sent: 8/6/2014 9:51:11 PM
Subject: Re: [Gluster-devel] Fw: Re: Corvid gluster testing

On 08/07/2014 07:18 AM, Anand Avati wrote:
It would be worth checking the perf numbers without -o acl (in case itwas enabled, as seen in the other gid thread). Client side -o aclmount option can have a negative impact on performance because of theincreased number of up-calls from FUSE for access().
Actually it is all write intensive.
here are the numbers they gave me from earlier runs:
%-latency Avg-latency Min-Latency Max-Latency No. of callsFop--------- ----------- ----------- ----------- ----------------0.00 0.00 us 0.00 us 0.00 us 99FORGET0.00 0.00 us 0.00 us 0.00 us 1093RELEASE0.00 0.00 us 0.00 us 0.00 us 468RELEASEDIR0.00 60.00 us 26.00 us 107.00 us 4SETATTR0.00 91.56 us 42.00 us 157.00 us 27UNLINK0.00 20.75 us 12.00 us 55.00 us 132GETXATTR0.00 19.03 us 9.00 us 95.00 us 152READLINK0.00 43.19 us 12.00 us 106.00 us 83OPEN0.00 18.37 us 8.00 us 92.00 us 257STATFS0.00 32.42 us 11.00 us 118.00 us 322OPENDIR0.00 36.09 us 5.00 us 109.00 us 359FSTAT0.00 51.14 us 37.00 us 183.00 us 663RENAME0.00 33.32 us 6.00 us 123.00 us 1451STAT0.00 821.79 us 21.00 us 22678.00 us 84READ0.00 34.88 us 3.00 us 139.00 us 2326FLUSH0.01 789.33 us 72.00 us 64054.00 us 347CREATE0.01 1144.63 us 43.00 us 280735.00 us 337FTRUNCATE0.01 47.82 us 16.00 us 19817.00 us 16513LOOKUP0.02 604.85 us 11.00 us 1233.00 us 1423READDIRP99.95 17.51 us 6.00 us 212701.00 us 300715967WRITE
    Duration: 5390 seconds
   Data Read: 1495257497 bytes
Data Written: 166546887668 bytes

Pranith
Thanks
On Wed, Aug 6, 2014 at 6:26 PM, Pranith Kumar Karampuri<pkara...@redhat.com> wrote:
On 08/07/2014 06:48 AM, Anand Avati wrote:
On Wed, Aug 6, 2014 at 6:05 PM, Pranith Kumar Karampuri<pkara...@redhat.com> wrote:
We checked this performance with plain distribute as well and onnfs it gave 25 minutes where as on nfs it gave around 90 minutesafter disabling throttling in both situations.
This sentence is very confusing. Can you please state it moreclearly?
sorry :-D.
We checked this performance on plain distribute volume by disablingthrottling.
On nfs the run took 25 minutes.
On fuse the run took 90 minutes.

Pranith
Thanks
I was wondering if any of you guys know what could contribute tothis difference.
Pranith

On 08/07/2014 01:33 AM, Anand Avati wrote:
Seems like heavy FINODELK contention. As a diagnostic step, canyou try disabling eager-locking and check the write performanceagain (gluster volume set $name cluster.eager-lock off)?
On Tue, Aug 5, 2014 at 11:44 AM, David F. Robinson<david.robin...@corvidtec.com> wrote:
Forgot to attach profile info in previous email.  Attached...

David


------ Original Message ------
From: "David F. Robinson" <david.robin...@corvidtec.com>
To: gluster-devel@gluster.org
Sent: 8/5/2014 2:41:34 PM
Subject: Fw: Re: Corvid gluster testing
I have been testing some of the fixes that Pranith incorporatedinto the 3.5.2-beta to see how they performed for moderatelevels of i/o. All of the stability issues that I had seen inprevious versions seem to have been fixed in 3.5.2; however,there still seem to be some significant performance issues.Pranith suggested that I send this to the gluster-devel emaillist, so here goes:
I am running an MPI job that saves a restart file to the glusterfile system. When I use the following in my fstab to mount thegluster volume, the i/o time for the 2.5GB file is roughly45-seconds.
gfsib01a.corvidtec.com:/homegfs /homegfs glusterfstransport=tcp,_netdev 0 0When I switch this to use the NFS protocol (see below), the i/otime is 2.5-seconds.
gfsib01a.corvidtec.com:/homegfs /homegfs nfsvers=3,intr,bg,rsize=32768,wsize=32768 0 0
The read-times for gluster are 10-20% faster than NFS, but thewrite times are almost 20x slower.
I am running SL 6.4 and glusterfs-3.5.2-0.1.beta1.el6.x86_64...

[root@gfs01a glusterfs]# gluster volume info homegfs
Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs

David

------ Forwarded Message ------
From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
To: "David Robinson" <david.robin...@corvidtec.com>
Cc: "Young Thomas" <tom.yo...@corvidtec.com>
Sent: 8/5/2014 2:25:38 AM
Subject: Re: Corvid gluster testing
gluster-devel@gluster.org is the email-id for the mailing list.We should probably start with the initial run numbers and thecomparison for glusterfs mount and nfs mounts. May be somethinglike
glusterfs mount: 90 minutes
nfs mount: 25 minutes
And profile outputs, volume config, number of mounts, hardwareconfiguration should be a good start.
Pranith

On 08/05/2014 09:28 AM, David Robinson wrote:
Thanks pranith


===============================
David F. Robinson, Ph.D.
President - Corvid Technologies
704.799.6944 x101 [office]
704.252.1310 [cell]
704.799.7974 [fax]
david.robin...@corvidtec.com
http://www.corvidtechnologies.com/
On Aug 4, 2014, at 11:22 PM, Pranith Kumar Karampuri<pkara...@redhat.com> wrote:
On 08/05/2014 08:33 AM, Pranith Kumar Karampuri wrote:

On 08/05/2014 08:29 AM, David F. Robinson wrote:
On 08/05/2014 12:51 AM, David F. Robinson wrote:
No. I don't want to use nfs. It eliminates most of thebenefits of why I want to use gluster. Failover redundancyof the pair, load balancing, etc.
What is the meaning of 'Failover redundancy of the pair,load balancing ' Could you elaborate more?smb/nfs/glusterfs are just access protocols that glustersupports functionality is almost same
Here is my understanding. Please correct me where I amwrong.
With gluster, if I am doing a write and one of thereplicated pairs goes down, there is no interruption to theI/o. The failover is handled by gluster and the fuse client.This isn't done if I use an nfs mount unless the componentof the pair that goes down isn't the one I used for themount.
With nfs, I will have to mount one of the bricks. So, if Ihave gfs01a, gfs01b, gfs02a, gfs02b, gfs03a, gfs03b, etc andmy fstab mounts gfs01a, it is my understanding that all ofmy I/o will go through gfs01a which then gets distributed toall of the other bricks. Gfs01a throughput becomes abottleneck. Where if I do a gluster mount using fuse, theload balancing is handled at the client side , not theserver side. If I have 1000-nodes accessing 20-glusterbricks, I need the load balancing aspect. I cannot have alltraffic going through the network interface on a singlebrick.
If I am wrong with the above assumptions, I guess myquestion is why would one ever use the gluster mount insteadof nfs and/or samba?
Tom: feel free to chime in if I have missed anything.
I see your point now. Yes the gluster server where you didthe mount is kind of a bottle neck.
Now that we established the problem is in theclients/protocols, you should send out a detailed mail ongluster-devel and see if anyone can help with you onperformance xlators that can improve it a bit more. My area ofexpertise is more on replication. I am sub-maintainer forreplication,locks components. I also know connectionmanagement/io-threads related issues which lead to hangs as Iworked on them before. Performance xlators are black box tome.
Performance xlators are enabled only on fuse gluster stack. Onnfs server mounts we disable all the performance xlatorsexcept write-behind as nfs client does lots of things forimproving performance. I suggest you guys follow up more ongluster-devel.
Appreciate all the help you did for improving the product :-).Thanks a ton!
Pranith
Pranith
David (Sent from mobile)

===============================
David F. Robinson, Ph.D.
President - Corvid Technologies
704.799.6944 x101 [office]
704.252.1310 [cell]
704.799.7974 [fax]
david.robin...@corvidtec.com
http://www.corvidtechnologies.com/
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-develmailing listGluster-devel@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Fw: Re: Corvid gluster testing

Reply via email to