Hi Rik, Nice clarity and detail in the description. Thanks!
inline... On Wed, Mar 7, 2018 at 8:29 PM, Rik Theys <rik.th...@esat.kuleuven.be> wrote: > Hi, > > We are looking into replacing our current storage solution and are > evaluating gluster for this purpose. Our current solution uses a SAN > with two servers attached that serve samba and NFS 4. Clients connect to > those servers using NFS or SMB. All users' home directories live on this > server. > > I would like to have some insight in who else is using gluster for home > directories for about 500 users and what performance they get out of the > solution. Which connectivity method are you using on the clients > (gluster native, nfs, smb)? Which volume options do you have configured > for your gluster volume? What hardware are you using? Are you using > snapshots and/or quota? If so, any number on performance impact? > > The solution I had in mind for our setup is multiple servers/bricks with > replica 3 arbiter 1 volume where each server is also running nfs-ganesha > and samba in HA. Clients would be connecting to one of the nfs servers > (dns round robin). In this case the nfs servers would be the gluster > clients. Gluster traffic would go over a dedicated network with 10G and > jumbo frames. > > I'm currently testing gluster (3.12, now 3.13) on older machines and > have created a replica 3 arbiter 1 volume 2x(2+1). I seem to run in all > sorts of (performance) problems. I must be doing something wrong but > I've tried all sorts of benchmarks and nothing seems to make my setup > live up to what I would expect from this hardware. > > * I understand that gluster only starts to work well when multiple > clients are connecting in parallel, but I did expect the single client > performance to be better. > > * Unpacking the linux-4.15.7.tar.xz file on the brick XFS filesystem > followed by a sync takes about 1 minute. Doing the same on the gluster > volume using the fuse client (client is one of the brick servers) takes > over 9 minutes and neither disk nor cpu nor network are reaching their > bottleneck. Doing the same over NFS-ganesha (client is a workstation > connected through gbit) takes even longer (more than 30min!?). > > I understand that unpacking a lot of small files may be the worst > workload for a distributed filesystem, but when I look at the file sizes > of the files in our users' home directories, more than 90% is smaller > than 1MB. > > * A file copy of a 300GB file over NFS 4 (nfs-ganesha) starts fast > (90MB/s) and then drops to 20MB/s. When I look at the servers during the > copy, I don't see where the bottleneck is as the cpu, disk and network > are not maxing out (on none of the bricks). When the same client copies > the file to our current NFS storage it is limited by the gbit network > connection of the client. > Both untar and cp are single-threaded, which means throughput is mostly dictated by latency. Latency is generally higher in a distributed FS; nfs-ganesha has an extra hop to the backend, and hence higher latency for most operations compared to glusterfs-fuse. You don't necessarily need multiple clients for good performance with gluster. Many multi-threaded benchmarks give good performance from a single client. Here for e.g., if you run multiple copy commands in parallel from the same client, I'd expect your aggregate transfer rate to improve. Been a long while since I looked at nfs-ganesha. But in terms of upper bounds for throughput tests: data needs to flow over the client->nfs-server link, and then, depending on which servers the file is located on, either 1x (if the nfs-ganesha node is also hosting one copy of the file, and neglecting arbiter) or 2x over the s2s link. With 1Gbps links, that means an upper bound between 125 MB/s and 62.5 MB/s, in the steady state, unless I miscalculated. -- Manoj > > * I had the 'cluster.optimize-lookup' option enabled but ran into all > sorts of issues where ls is showing either the wrong files (content of a > different directory), or claiming a directory does not exist when mkdir > says it already exists... I current have the following options set: > > server.outstanding-rpc-limit: 256 > client.event-threads: 4 > performance.io-thread-count: 16 > performance.parallel-readdir: on > server.event-threads: 4 > performance.cache-size: 2GB > performance.rda-cache-limit: 128MB > performance.write-behind-window-size: 8MB > performance.md-cache-timeout: 600 > performance.cache-invalidation: on > performance.stat-prefetch: on > network.inode-lru-limit: 500000 > performance.nl-cache-timeout: 600 > performance.nl-cache: on > features.cache-invalidation-timeout: 600 > features.cache-invalidation: on > transport.address-family: inet > nfs.disable: on > cluster.enable-shared-storage: enable > > The brick servers have 2 dual-core cpu's so I've set the client and > server event threads to 4. > > * When using nfs-ganesha I run into bugs that makes me wonder who is > using nfs-ganesha with gluster and why are they not hitting these bugs: > > https://bugzilla.redhat.com/show_bug.cgi?id=1543996 > https://bugzilla.redhat.com/show_bug.cgi?id=1405147 > > * nfs-ganesha does not have the 'async' option that kernel nfs has. I > can understand why they don't want to implement this feature, but do > wonder how others are increasing their nfs-ganesha performance. I've put > some SSD's in each brick and have them configured as lvmcache to the > bricks. This setup only increases throughput once the data is on the ssd > and not for just-written data. > > Regards, > > Rik > >  4 servers with 2 1Gbit nics (one for the client traffic, one for s2s > traffic with jumbo frames enabled). Each server has two disks (bricks). > >  ioping from the nfs client shows the following latencies: > min/avg/max/mdev = 695.2 us / 2.17 ms / 7.05 ms / 1.92 ms > > ping rtt from client to nfs-ganesha server: > rtt min/avg/max/mdev = 0.106/1.551/6.195/2.098 ms > > ioping on the volume fuse mounted from a brick: > min/avg/max/mdev = 557.0 us / 824.4 us / 2.68 ms / 421.9 us > > ioping on the brick xfs filesystem: > min/avg/max/mdev = 275.2 us / 515.2 us / 12.4 ms / 1.21 ms > > Are these normal numbers? > > > _______________________________________________ > Gluster-users mailing list > Glusterfirstname.lastname@example.org > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list Glusteremail@example.com http://lists.gluster.org/mailman/listinfo/gluster-users