Hey there,
just a little update...

This week we switched from our 3 "old" gluster servers to 3 new ones,
and with that we threw some hardware at the problem...

old: 3 servers, each has 4 * 10 TB disks; each disk is used as a brick
-> 4 x 3 = 12 distribute-replicate
new: 3 servers, each has 10 * 10 TB disks; we built 2 RAID10 (6 disks
and 4 disks), each RAID10 is a brick -> we split our data into 2
volumes, 1 x 3 = 3 replicate; as filesystem we use XFS (instead of
ext4) with mount options inode64,noatime,nodiratime now.

What we've seen so far: the volumes are independent - if one volume is
under load, the other one isn't affected by that. Throughput, latency
etc. seems to be better now.

Of course you waste a lot of disk space when using RAID10 and
replicate setup: 100TB per server (so 300TB in total) result in ~50TB
volume size, but during the last year we had problems due to hard disk
errors and the resulting brick restore (reset-brick) which took very
long. Was a hard time... :-/

So our conclusion was: as the heal can be really painful, take very
long and influence performance very badly -> try to avoid heals by not
having to do "big" heals at all. That's why we chose a RAID10: under
normal circumstances (a disk failing from time to time) there may be a
RAID resync, but that may be faster and cause fewer performance issues
than having to restore a complete brick.

Or, more general: if you have big, slow disk and quite high I/O ->
think about not using single disks as bricks. If you have the hardware
(and the money), think about using RAID1 or RAID10. The smaller and/or
faster the disks are (e.g. you have a lot of 1TB SSD/NVMe), using them
as bricks might work better as (in case of disk failure) the heal
should be much faster. No information about RAID5/6 possible, wasn't
taken into consideration... just my 2 €cents from (still) a gluster
amateur :-)

Best regards,
Hubert

Am Di., 22. Jan. 2019 um 07:11 Uhr schrieb Amar Tumballi Suryanarayan
<atumb...@redhat.com>:
>
>
>
> On Thu, Jan 10, 2019 at 1:56 PM Hu Bert <revi...@googlemail.com> wrote:
>>
>> Hi,
>>
>> > > We ara also using 10TB disks, heal takes 7-8 days.
>> > > You can play with "cluster.shd-max-threads" setting. It is default 1 I
>> > > think. I am using it with 4.
>> > > Below you can find more info:
>> > > https://access.redhat.com/solutions/882233
>> > cluster.shd-max-threads: 8
>> > cluster.shd-wait-qlength: 10000
>>
>> Our setup:
>> cluster.shd-max-threads: 2
>> cluster.shd-wait-qlength: 10000
>>
>> > >> Volume Name: shared
>> > >> Type: Distributed-Replicate
>> > A, you have distributed-replicated volume, but I choose only replicated
>> > (for beginning simplicity :)
>> > May be replicated volume are healing faster?
>>
>> Well, maybe our setup with 3 servers and 4 disks=bricks == 12 bricks,
>> resulting in a distributed-replicate volume (all /dev/sd{a,b,c,d}
>> identical) , isn't optimal? And it would be better to create a
>> replicate 3 volume with only 1 (big) brick per server (with 4 disks:
>> either a logical volume or sw/hw raid)?
>>
>> But it would be interesting to know if a replicate volume is healing
>> faster than a distributed-replicate volume - even if there was only 1
>> faulty brick.
>>
>
> We don't have any data point to agree to this, but it may be true. Specially, 
> as the crawling when DHT (ie, distribute) is involved can get little slower, 
> which means, the healing would get slower too.
>
> We are trying to experiment few performance enhancement patches (like 
> https://review.gluster.org/20636), would be great to see how things work with 
> newer base. Will keep the list updated about performance numbers once we have 
> some more data on them.
>
> -Amar
>
>>
>>
>> Thx
>> Hubert
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>
> --
> Amar Tumballi (amarts)
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to