Hey there, just a little update... This week we switched from our 3 "old" gluster servers to 3 new ones, and with that we threw some hardware at the problem...
old: 3 servers, each has 4 * 10 TB disks; each disk is used as a brick -> 4 x 3 = 12 distribute-replicate new: 3 servers, each has 10 * 10 TB disks; we built 2 RAID10 (6 disks and 4 disks), each RAID10 is a brick -> we split our data into 2 volumes, 1 x 3 = 3 replicate; as filesystem we use XFS (instead of ext4) with mount options inode64,noatime,nodiratime now. What we've seen so far: the volumes are independent - if one volume is under load, the other one isn't affected by that. Throughput, latency etc. seems to be better now. Of course you waste a lot of disk space when using RAID10 and replicate setup: 100TB per server (so 300TB in total) result in ~50TB volume size, but during the last year we had problems due to hard disk errors and the resulting brick restore (reset-brick) which took very long. Was a hard time... :-/ So our conclusion was: as the heal can be really painful, take very long and influence performance very badly -> try to avoid heals by not having to do "big" heals at all. That's why we chose a RAID10: under normal circumstances (a disk failing from time to time) there may be a RAID resync, but that may be faster and cause fewer performance issues than having to restore a complete brick. Or, more general: if you have big, slow disk and quite high I/O -> think about not using single disks as bricks. If you have the hardware (and the money), think about using RAID1 or RAID10. The smaller and/or faster the disks are (e.g. you have a lot of 1TB SSD/NVMe), using them as bricks might work better as (in case of disk failure) the heal should be much faster. No information about RAID5/6 possible, wasn't taken into consideration... just my 2 €cents from (still) a gluster amateur :-) Best regards, Hubert Am Di., 22. Jan. 2019 um 07:11 Uhr schrieb Amar Tumballi Suryanarayan <atumb...@redhat.com>: > > > > On Thu, Jan 10, 2019 at 1:56 PM Hu Bert <revi...@googlemail.com> wrote: >> >> Hi, >> >> > > We ara also using 10TB disks, heal takes 7-8 days. >> > > You can play with "cluster.shd-max-threads" setting. It is default 1 I >> > > think. I am using it with 4. >> > > Below you can find more info: >> > > https://access.redhat.com/solutions/882233 >> > cluster.shd-max-threads: 8 >> > cluster.shd-wait-qlength: 10000 >> >> Our setup: >> cluster.shd-max-threads: 2 >> cluster.shd-wait-qlength: 10000 >> >> > >> Volume Name: shared >> > >> Type: Distributed-Replicate >> > A, you have distributed-replicated volume, but I choose only replicated >> > (for beginning simplicity :) >> > May be replicated volume are healing faster? >> >> Well, maybe our setup with 3 servers and 4 disks=bricks == 12 bricks, >> resulting in a distributed-replicate volume (all /dev/sd{a,b,c,d} >> identical) , isn't optimal? And it would be better to create a >> replicate 3 volume with only 1 (big) brick per server (with 4 disks: >> either a logical volume or sw/hw raid)? >> >> But it would be interesting to know if a replicate volume is healing >> faster than a distributed-replicate volume - even if there was only 1 >> faulty brick. >> > > We don't have any data point to agree to this, but it may be true. Specially, > as the crawling when DHT (ie, distribute) is involved can get little slower, > which means, the healing would get slower too. > > We are trying to experiment few performance enhancement patches (like > https://review.gluster.org/20636), would be great to see how things work with > newer base. Will keep the list updated about performance numbers once we have > some more data on them. > > -Amar > >> >> >> Thx >> Hubert >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > -- > Amar Tumballi (amarts) _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users