To give some context here, the environment we were testing this in looks like 
this

* 2 x XenServer hosts, Dell R430s with Xeon E5-2630 v3 CPUs and Intel X520 10g 
NICS dedicated to the iSCSI traffic for GFS2 (only using one per host)
* Dedicated Linux filer packed with SSDs and 128GB of RAM. The native storage 
can sustainably support > 5GB/s write throughput and the host (currently) has a 
bonded pair of X710 10g NICS to serve the hosts.

So basically the storage is significantly faster than the network and will not 
be the bottleneck in these tests.

Whether what we observe here will change when we update the filer to have 6 10g 
NICs (planned in the next few weeks) will remain to be seen, obviously we'll 
need to add some more hosts to the cluster but we have another 10 in the rack 
so that isn't an issue.

Mark.

-----Original Message-----
From: Bob Peterson <[email protected]> 
Sent: 28 September 2018 15:00
To: Tim Smith <[email protected]>
Cc: Steven Whitehouse <[email protected]>; Mark Syms <[email protected]>; 
[email protected]; Ross Lagerwall <[email protected]>
Subject: Re: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance 
improvements

----- Original Message -----
> I think what's happening for us is that the work that needs to be done 
> to release an rgrp lock is happening pretty fast and is about the same 
> in all cases, so the stats are not providing a meaningful distinction. 
> We see the same lock (or small number of locks) bouncing back and 
> forth between nodes with neither node seeming to consider them 
> congested enough to avoid, even though the FS is <50% full and there must be 
> plenty of other non-full rgrps.
> 
> --
> Tim Smith <[email protected]>

Hi Tim,

Interesting.
I've done experiments in the past where I allowed resource group glocks to take 
advantage of the "minimum hold time" which is today only used for inode glocks. 
In my experiments it's made no appreciable difference that I can recall, but it 
might be an interesting experiment for you to try.

Steve's right that we need to be careful not to improve one aspect of 
performance while causing another aspect's downfall, like improving intra-node 
congestion problems at the expense of inter-node congestion.

Regards,

Bob Peterson

Reply via email to