On Thu, Jun 8, 2017 at 12:49 PM, Pranith Kumar Karampuri < [email protected]> wrote:
> > > On Fri, Jun 2, 2017 at 1:01 AM, Serkan Çoban <[email protected]> > wrote: > >> >Is it possible that this matches your observations ? >> Yes that matches what I see. So 19 files is being in parallel by 19 >> SHD processes. I thought only one file is being healed at a time. >> Then what is the meaning of disperse.shd-max-threads parameter? If I >> set it to 2 then each SHD thread will heal two files at the same time? >> > > Yes that is the idea. > One small correction. So if you have n*(16+4) and the server has at least one brick contributing to these n subvolumes, then the number of heals it will do will be 'n' and if you set the max-threads to 2 then it will be 2n. So the option is per EC subvolume. > > >> >> >How many IOPS can handle your bricks ? >> Bricks are 7200RPM NL-SAS disks. 70-80 random IOPS max. But write >> pattern seems sequential, 30-40MB bulk writes every 4-5 seconds. >> This is what iostat shows. >> >> >Do you have a test environment where we could check all this ? >> Not currently but will have in 4-5 weeks. New servers are arriving, I >> will add this test to my notes. >> >> > There's a feature to allow to configure the self-heal block size to >> optimize these cases. The feature is available on 3.11. >> I did not see this in 3.11 release notes, what parameter name I should >> look for? >> > > disperse.self-heal-window-size > > > + { .key = {"self-heal-window-size"}, > + .type = GF_OPTION_TYPE_INT, > + .min = 1, > + .max = 1024, > + .default_value = "1", > + .description = "Maximum number blocks(128KB) per file for which " > + "self-heal process would be applied > simultaneously." > + }, > > This is the patch: https://review.gluster.org/17098 > > +Sunil, > Could you add this to release notes please? > >> >> >> >> On Thu, Jun 1, 2017 at 10:30 AM, Xavier Hernandez <[email protected]> >> wrote: >> > Hi Serkan, >> > >> > On 30/05/17 10:22, Serkan Çoban wrote: >> >> >> >> Ok I understand that heal operation takes place on server side. In >> >> this case I should see X KB >> >> out network traffic from 16 servers and 16X KB input traffic to the >> >> failed brick server right? So that process will get 16 chunks >> >> recalculate our chunk and write it to disk. >> > >> > >> > That should be the normal operation for a single heal. >> > >> >> The problem is I am not seeing such kind of traffic on servers. In my >> >> configuration (16+4 EC) I see 20 servers are all have 7-8MB outbound >> >> traffic and none of them has more than 10MB incoming traffic. >> >> Only heal operation is happening on cluster right now, no client/other >> >> traffic. I see constant 7-8MB write to healing brick disk. So where is >> >> the missing traffic? >> > >> > >> > Not sure about your configuration, but probably you are seeing the >> result of >> > having the SHD of each server doing heals. That would explain the >> network >> > traffic you have. >> > >> > Suppose that all SHD but the one on the damaged brick are working. In >> this >> > case 19 servers will peek 16 fragments each. This gives 19 * 16 = 304 >> > fragments to be requested. EC balances the reads among all available >> > servers, and there's a chance (1/19) that a fragment is local to the >> server >> > asking it. So we'll need a total of 304 - 304 / 19 = 288 network >> requests, >> > 288 / 19 = 15.2 sent by each server. >> > >> > If we have a total of 288 requests, it means that each server will >> answer >> > 288 / 19 = 15.2 requests. The net effect of all this is that each >> healthy >> > server is sending 15.2*X bytes of data and each server is receiving >> 15.2*X >> > bytes of data. >> > >> > Now we need to account for the writes to the damaged brick. We have 19 >> > simultaneous heals. This means that the damaged brick will receive 19*X >> > bytes of data, and each healthy server will send X additional bytes of >> data. >> > >> > So: >> > >> > A healthy server receives 15.2*X bytes of data >> > A healthy server sends 16.2*X bytes of data >> > A damaged server receives 19*X bytes of data >> > A damaged server sends few bytes of data (communication and >> synchronization >> > overhead basically) >> > >> > As you can see, in this configuration each server has almost the same >> amount >> > of inbound and outbound traffic. Only big difference is the damaged >> brick, >> > that should receive a little more of traffic, but it should send much >> less. >> > >> > Is it possible that this matches your observations ? >> > >> > There's one more thing to consider here, and it's the apparent low >> > throughput of self-heal. One possible thing to check is the small size >> and >> > random behavior of the requests. >> > >> > Assuming that each request has a size of ~128 / 16 = 8KB, at a rate of >> ~8 >> > MB/s the servers are processing ~1000 IOPS. Since requests are going to >> 19 >> > different files, even if each file is accessed sequentially, the real >> effect >> > will be like random access (some read-ahead on the filesystem can >> improve >> > reads a bit, but writes won't benefit so much). >> > >> > How many IOPS can handle your bricks ? >> > >> > Do you have a test environment where we could check all this ? if >> possible >> > it would be interesting to have only a single SHD (kill all SHD from all >> > servers but one). In this situation, without client accesses, we should >> see >> > the 16/1 ratio of reads vs writes on the network. We should also see a >> > similar of even a little better speed because all reads and writes will >> be >> > sequential, optimizing available IOPS. >> > >> > There's a feature to allow to configure the self-heal block size to >> optimize >> > these cases. The feature is available on 3.11. >> > >> > Best regards, >> > >> > Xavi >> > >> > >> >> >> >> On Tue, May 30, 2017 at 10:25 AM, Ashish Pandey <[email protected]> >> >> wrote: >> >>> >> >>> >> >>> When we say client side heal or server side heal, we basically talking >> >>> about >> >>> the side which "triggers" heal of a file. >> >>> >> >>> 1 - server side heal - shd scans indices and triggers heal >> >>> >> >>> 2 - client side heal - a fop finds that file needs heal and it >> triggers >> >>> heal >> >>> for that file. >> >>> >> >>> Now, what happens when heal gets triggered. >> >>> In both the cases following functions takes part - >> >>> >> >>> ec_heal => ec_heal_throttle=>ec_launch_heal >> >>> >> >>> Now ec_launch_heal just creates heal tasks (with ec_synctask_heal_wrap >> >>> which >> >>> calls ec_heal_do ) and put it into a queue. >> >>> This happens on server and "syncenv" infrastructure which is nothing >> but >> >>> a >> >>> set of workers pick these tasks and execute it. That is when actual >> >>> read/write for >> >>> heal happens. >> >>> >> >>> >> >>> ________________________________ >> >>> From: "Serkan Çoban" <[email protected]> >> >>> To: "Ashish Pandey" <[email protected]> >> >>> Cc: "Gluster Users" <[email protected]> >> >>> Sent: Monday, May 29, 2017 6:44:50 PM >> >>> Subject: Re: [Gluster-users] Heal operation detail of EC volumes >> >>> >> >>> >> >>>>> Healing could be triggered by client side (access of file) or server >> >>>>> side >> >>>>> (shd). >> >>>>> However, in both the cases actual heal starts from "ec_heal_do" >> >>>>> function. >> >>> >> >>> If I do a recursive getfattr operation from clients, then all heal >> >>> operation is done on clients right? Client read the chunks, calculate >> >>> and write the missing chunk. >> >>> And If I don't access files from client then SHD daemons will start >> >>> heal and read,calculate,write the missing chunks right? >> >>> >> >>> In first case EC calculations takes places in client fuse process, in >> >>> second case EC calculations will be made in SHD process right? >> >>> Does brick process has any role in EC calculations? >> >>> >> >>> On Mon, May 29, 2017 at 3:32 PM, Ashish Pandey <[email protected]> >> >>> wrote: >> >>>> >> >>>> >> >>>> >> >>>> ________________________________ >> >>>> From: "Serkan Çoban" <[email protected]> >> >>>> To: "Gluster Users" <[email protected]> >> >>>> Sent: Monday, May 29, 2017 5:13:06 PM >> >>>> Subject: [Gluster-users] Heal operation detail of EC volumes >> >>>> >> >>>> Hi, >> >>>> >> >>>> When a brick fails in EC, What is the healing read/write data path? >> >>>> Which processes do the operations? >> >>>> >> >>>> Healing could be triggered by client side (access of file) or server >> >>>> side >> >>>> (shd). >> >>>> However, in both the cases actual heal starts from "ec_heal_do" >> >>>> function. >> >>>> >> >>>> >> >>>> Assume a 2GB file is being healed in 16+4 EC configuration. I was >> >>>> thinking that SHD deamon on failed brick host will read 2GB from >> >>>> network and reconstruct its 100MB chunk and write it on to brick. Is >> >>>> this right? >> >>>> >> >>>> You are correct about read/write. >> >>>> The only point is that, SHD deamon on one of the good brick will pick >> >>>> the >> >>>> index entry and heal it. >> >>>> SHD deamon scans the .glusterfs/index directory and heals the >> entries. >> >>>> If >> >>>> the brick went down while IO was going on, index will be present on >> >>>> killed >> >>>> brick also. >> >>>> However, if a brick was down and then you started writing on a file >> then >> >>>> in >> >>>> this case index entry would not be present on killed brick. >> >>>> So even after brick will be UP, sdh on that brick will not be able >> to >> >>>> find >> >>>> it out this index. However, other bricks would have entries and shd >> on >> >>>> that >> >>>> brick will heal it. >> >>>> >> >>>> Note: I am considering each brick on different node. >> >>>> >> >>>> Ashish >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> Gluster-users mailing list >> >>>> [email protected] >> >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >>>> >> >>> _______________________________________________ >> >>> Gluster-users mailing list >> >>> [email protected] >> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >>> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> [email protected] >> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Pranith > -- Pranith
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
