Hello everybody! I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are actually virtual machines located on 3 separate physical XenServer7.1 servers)
They are all connected via infiniband network. Iperf3 shows around *23 Gbit/s network bandwidth *between each 2 of them. Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with logical volume created on top of it, formatted with *xfs*. Gluster top reports the following throughput: root@fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count 524288 > list-cnt 0 > Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick > Throughput *631.82 MBps *time 3.3989 secs > Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick > Throughput *566.96 MBps *time 3.7877 secs > Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick > Throughput *546.65 MBps *time 3.9285 secs root@fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count 524288 > list-cnt 0 > Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick > Throughput *539.60 MBps *time 3.9798 secs > Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick > Throughput *580.07 MBps *time 3.7021 secs And two *pure replicated ('replica 2' and 'replica 3')* volumes. *The 'replica 2' volume is for testing purpose only. > Volume Name: r2vol > Type: Replicate > Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick > Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick > Options Reconfigured: > nfs.disable: on > > Volume Name: r3vol > Type: Replicate > Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick > Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick > Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick > Options Reconfigured: > nfs.disable: on *Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount* > root@centos7u3-nogdesktop2 ~ $ mount |grep gluster > gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) *The problem *is that there is a significant performance loss with smaller block sizes. For example: *4K block size* [replica 3 volume] root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s* [replica 2 volume] root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s* *512K block size* [replica 3 volume] root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048 2048+0 records in 2048+0 records out 1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s* [replica 2 volume] root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048 2048+0 records in 2048+0 records out 1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s* With bigger block size It's still not where I expect it to be, but at least it starts to make some sense. I've been trying to solve this for a very long time with no luck. I've already tried both kernel tuning (different 'tuned' profiles and the ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster volume options, including write-behind/flush-behind/write-behind-window-size. The latter, to my surprise, didn't make any difference. 'Cause at first I thought it was the buffering issue but it turns out it does buffer writes, just not very efficient (well at least what it looks like in the *gluster profile output*) root@fsnode2 ~ $ gluster volume profile r3vol info clear > ... > Cleared stats. root@centos7u3-nogdesktop2 ~ $ dd if=/dev/zero > of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 > 262144+0 records in > 262144+0 records out > 1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s > root@fsnode2 ~ $ gluster volume profile r3vol info > Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick > ------------------------------------------------------- > Cumulative Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 18.00 us 18.00 us 18.00 us 1 > STATFS > 0.00 20.50 us 11.00 us 30.00 us 2 > FLUSH > 0.00 22.50 us 17.00 us 28.00 us 2 > FINODELK > 0.01 76.50 us 65.00 us 88.00 us 2 > FXATTROP > 0.01 177.00 us 177.00 us 177.00 us 1 > CREATE > 0.02 56.14 us 23.00 us 128.00 us 7 > LOOKUP > 0.02 259.00 us 20.00 us 498.00 us 2 > ENTRYLK > 99.94 59.23 us 17.00 us 10914.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Interval 0 Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 18.00 us 18.00 us 18.00 us 1 > STATFS > 0.00 20.50 us 11.00 us 30.00 us 2 > FLUSH > 0.00 22.50 us 17.00 us 28.00 us 2 > FINODELK > 0.01 76.50 us 65.00 us 88.00 us 2 > FXATTROP > 0.01 177.00 us 177.00 us 177.00 us 1 > CREATE > 0.02 56.14 us 23.00 us 128.00 us 7 > LOOKUP > 0.02 259.00 us 20.00 us 498.00 us 2 > ENTRYLK > 99.94 59.23 us 17.00 us 10914.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick > ------------------------------------------------------- > Cumulative Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 33.00 us 33.00 us 33.00 us 1 > STATFS > 0.00 22.50 us 13.00 us 32.00 us 2 > ENTRYLK > 0.00 32.00 us 26.00 us 38.00 us 2 > FLUSH > 0.01 47.50 us 16.00 us 79.00 us 2 > FINODELK > 0.01 157.00 us 157.00 us 157.00 us 1 > CREATE > 0.01 92.00 us 70.00 us 114.00 us 2 > FXATTROP > 0.03 72.57 us 39.00 us 121.00 us 7 > LOOKUP > 99.94 47.97 us 15.00 us 1598.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Interval 0 Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 33.00 us 33.00 us 33.00 us 1 > STATFS > 0.00 22.50 us 13.00 us 32.00 us 2 > ENTRYLK > 0.00 32.00 us 26.00 us 38.00 us 2 > FLUSH > 0.01 47.50 us 16.00 us 79.00 us 2 > FINODELK > 0.01 157.00 us 157.00 us 157.00 us 1 > CREATE > 0.01 92.00 us 70.00 us 114.00 us 2 > FXATTROP > 0.03 72.57 us 39.00 us 121.00 us 7 > LOOKUP > 99.94 47.97 us 15.00 us 1598.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick > ------------------------------------------------------- > Cumulative Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 58.00 us 58.00 us 58.00 us 1 > STATFS > 0.00 38.00 us 38.00 us 38.00 us 2 > ENTRYLK > 0.01 59.00 us 32.00 us 86.00 us 2 > FLUSH > 0.01 81.00 us 33.00 us 129.00 us 2 > FINODELK > 0.01 91.50 us 73.00 us 110.00 us 2 > FXATTROP > 0.01 239.00 us 239.00 us 239.00 us 1 > CREATE > 0.04 103.14 us 63.00 us 210.00 us 7 > LOOKUP > 99.92 52.99 us 16.00 us 11289.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Interval 0 Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 58.00 us 58.00 us 58.00 us 1 > STATFS > 0.00 38.00 us 38.00 us 38.00 us 2 > ENTRYLK > 0.01 59.00 us 32.00 us 86.00 us 2 > FLUSH > 0.01 81.00 us 33.00 us 129.00 us 2 > FINODELK > 0.01 91.50 us 73.00 us 110.00 us 2 > FXATTROP > 0.01 239.00 us 239.00 us 239.00 us 1 > CREATE > 0.04 103.14 us 63.00 us 210.00 us 7 > LOOKUP > 99.92 52.99 us 16.00 us 11289.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes At this point I'm officially run out of idea where to look next. So any help, suggestions or pointers are highly appreciated! -- Best regards, Anastasia Belyaeva
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users