Looks awesome! Thank you Kengo. I think this is an important change that can further support dev efficiency and enable us to release early, release often.
Kengo Seki <[email protected]> 於 2020年11月19日 週四 上午8:23寫道: > Thank you for the comment, Evans and Olaf! Following your advices, I > did the following: > > * Removed unused EBS volumes (1TBx2 and 30GBx1). > * Replaced slaves (02, 03, 06, 07) with newly created EC2 instances. > Instance types were also upgraded (m3, m4 -> m5). > * Attached EBS volumes to the above instances. 200GBs to 02 and 03, > 800GB to 06 and 07. > (I said that I was going to separate two 2TB volumes into four > 500GBs in the past, but slave 06 had used 660GB+ before replacing, so > I changed the allocation) > > Then we're using the following resources now, in accordance with Evans' > email: > > * EC2 instances: one m3.xlarge (master), three m5.xlarge (slave 02, 03 > and 07) and one m5.2xlarge (slave 06) > * EBS volumes: one 1TB gp2 (master), two 200GB gp2 (slave 02 and 03), > and two 800GB gp2 (slave 06 and 07) > > New servers seem to work fine with Jenkins [1]. I also updated the > "Bigtop CI Setup Guide" page on cwiki [2]. > Let me know if you find something wrong! :) > > [1]: https://ci.bigtop.apache.org/computer/ > [2]: > https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+CI+Setup+Guide#BigtopCISetupGuide-SetupJenkinsslaves > > Kengo Seki <[email protected]> > > On Wed, Nov 11, 2020 at 4:59 AM Olaf Flebbe <[email protected]> wrote: > > > > hi, > > > > fully supporting evans: > > the unconnected disk do not contain anything valuable, please remove. it > might make sense to even recreate the current disks on ssd, a bit larger as > before if needed. > > > > olaf > > > > > Am 10.11.2020 um 08:09 schrieb Evans Ye <[email protected]>: > > > > > > Yes I think overall your plan is good. > > > What's the purpose of leveraging EBS snapshot? Is it to backup the > things > > > we have before migration? > > > Except for the master node(have jenkins settings stored on disk), all > those > > > slaves can be wiped out directly. > > > > > > > > > > > > Kengo Seki <[email protected]> 於 2020年11月10日 週二 下午2:42寫道: > > > > > >> Thanks everyone for the information! Now I understand our > circumstances. > > >> So we're going to split two 1TB volumes attached to slave06 and 07 > > >> into four 500GB volumes (and change their type to gp2), reattach them > > >> to 02, 03, 06 and 07, and remove currently unused two 1TB volumes, > > >> right? > > >> > > >>> Kengo would you like to take this, or you need a help? > > >> > > >> I think I can do them somehow (maybe using EBS snapshot?), but let me > > >> ask your help if I'm stuck. :) > > >> > > >> Kengo Seki <[email protected]> > > >> > > >> On Tue, Nov 10, 2020 at 1:00 AM Evans Ye <[email protected]> wrote: > > >>> > > >>> OK. I got it now. > > >>> So the newly created volumes are currently attached to slave06_2 and > > >>> slave07_2, respectively. > > >>> However, they're standard HDD, not GP2 SSD. I think we can take this > > >> chance > > >>> to recreate those 2 slaves and do an overhaul of our infrastructure. > > >>> > > >>> Kengo would you like to take this, or you need a help? > > >>> > > >>> Evans > > >>> > > >>> Olaf Flebbe <[email protected]> 於 2020年11月6日 週五 上午2:40寫道: > > >>> > > >>>> Hi, > > >>>> > > >>>> OMG . I think I did it. > > >>>> > > >>>> A few years ago two of the instance had a hardware problems and did > not > > >>>> reboot any more, filesystem was corrupted and so on. That was at > the > > >> time > > >>>> of the spectre vulnarability discovery. (2018) . At that time AWS > had > > >> major > > >>>> instabilities since updating firmware seem to have failed for some > > >> classes > > >>>> of hardware. > > >>>> > > >>>> I tried to recreate them as close as possible but I may have left > > >>>> accidentely the volumes around. Please lets delete them. > > >>>> > > >>>> Olaf > > >>>> > > >>>>> Am 05.11.2020 um 14:44 schrieb Konstantin Boudnik <[email protected] > >: > > >>>>> > > >>>>> Thanks Evans! > > >>>>> > > >>>>> It's great you found the details: they are definitely accurate as I > > >> am > > >>>>> recalling now. Kengo, do you think splitting the volumes would help > > >> us > > >>>> for a > > >>>>> while? Or perhaps we shall try to expand the resource pool (which > > >> might > > >>>> take a > > >>>>> while)? > > >>>>> > > >>>>> Thanks! > > >>>>> Cos > > >>>>> > > >>>>> On Thu, Nov 05, 2020 at 12:32PM, Evans Ye wrote: > > >>>>>> In fact, the original deal of our resource is as follows: > > >>>>>> > > >>>>>>> 1 m3.2xlarge for CI > > >>>>>>> 4 m3.xlarge for CI and demo > > >>>>>>> 3 1TB EBS volumes > > >>>>>>> 5 elastic IP addresses > > >>>>>> > > >>>>>> So technically we should not use that 2 additional 1T volumes > > >> (created > > >>>> in > > >>>>>> 2018). > > >>>>>> Instead, I think what we can do is to split up one of the existing > > >> 1TB > > >>>>>> volumes(ex: attached to slave07) into smaller volumes for slave02, > > >> 03. > > >>>>>> > > >>>>>> > > >>>>>> Konstantin Boudnik <[email protected]> 於 2020年11月4日 週三 下午2:28寫道: > > >>>>>> > > >>>>>>> Kengo, > > >>>>>>> > > >>>>>>> We had an agreement with EMR folks that we are using the > resources > > >>>>>>> available > > >>>>>>> to us and it is included into their budget (or something to this > > >>>> extent). > > >>>>>>> If > > >>>>>>> you see some of the resources available under our account - I > > >> don't see > > >>>>>>> why we > > >>>>>>> can't use them. > > >>>>>>> > > >>>>>>> If for whatever reason we need to expand the pool, that would > > >> require a > > >>>>>>> separate conversation with nice folks from that team, I imagine. > > >> Please > > >>>>>>> let me > > >>>>>>> know if I can help with this going forward. > > >>>>>>> > > >>>>>>> Thanks! > > >>>>>>> Cos > > >>>>>>> > > >>>>>>> On Wed, Nov 04, 2020 at 11:11AM, Kengo Seki wrote: > > >>>>>>>> Thanks for the comment, Cos! I was able to start docker service > on > > >>>>>>>> docker-slave-02 without replacing and am running some Jenkins > > >> jobs on > > >>>>>>>> it now, so I'll replace it in the short future. > > >>>>>>>> I have a few things that I'd like to ask additionally: > > >>>>>>>> > > >>>>>>>> * docker-slave-02 and 03 have a gp2 storage as a root volume > that > > >> has > > >>>>>>>> only 8GiB capacity, and they sometimes run short and stop the > CI. > > >>>>>>>> May I increase them to 20 or 30 GiB when I replace those > > >> instances? > > >>>>>>>> (I'm not sure what is our budget) > > >>>>>>>> > > >>>>>>>> * They use an instance store with 30GiB to put docker images > into > > >> it, > > >>>>>>>> and they also sometimes run short. > > >>>>>>>> It seems there are two unused volumes with 1TiB (vol-ae71114e > and > > >>>>>>>> vol-4efa69ae) on AWS console. > > >>>>>>>> May I attach them to 02 and 03 instead of instance stores, or > are > > >>>>>>>> they backups or something? > > >>>>>>>> > > >>>>>>>> Kengo Seki <[email protected]> > > >>>>>>>> > > >>>>>>>> On Mon, Nov 2, 2020 at 6:41 PM Konstantin Boudnik < > [email protected] > > >>> > > >>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> I'd say let replace the broken one. I don't think there's a > > >>>> sentimental > > >>>>>>>>> value attached ;) > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> With regards, > > >>>>>>>>> Cos > > >>>>>>>>> > > >>>>>>>>> On 02.11.2020 08:16, Kengo Seki wrote: > > >>>>>>>>>> Thanks for updating Olaf! I've just noticed the Jenkins UI > > >> became > > >>>>>>> cool :) > > >>>>>>>>>> Regarding docker-slave-02, I'll try to replace it after > waiting > > >> for > > >>>> a > > >>>>>>>>>> while to make sure there's no objection. > > >>>>>>>>>> > > >>>>>>>>>> Kengo Seki <[email protected]> > > >>>>>>>>>> > > >>>>>>>>>> On Mon, Nov 2, 2020 at 1:39 PM Jun HE <[email protected]> > wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks a lot for the update, Olaf! > > >>>>>>>>>>> > > >>>>>>>>>>> Olaf Flebbe <[email protected]> 于2020年10月31日周六 上午3:24写道: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hi, > > >>>>>>>>>>>> > > >>>>>>>>>>>> All machines patched. Jenkins and it plugins are updated: > > >>>>>>>>>>>> > > >>>>>>>>>>>> Things to be noted: > > >>>>>>>>>>>> > > >>>>>>>>>>>> * Slave 2 seems to be in serious problems. The disk image > > >> seems to > > >>>>>>> be > > >>>>>>>>>>>> corrupt, I would say: > > >>>>>>>>>>>> One of the problems: docker does not start any more. > > >>>>>>>>>>>> Is there anything important on it ? If yes please contact > me. > > >> I > > >>>>>>> would > > >>>>>>>>>>>> recommend to set up slave2 from scratch again. > > >>>>>>>>>>>> > > >>>>>>>>>>>> * There was a warning regarding Copy Artifacts Plugin. It > now > > >>>>>>> imposes > > >>>>>>>>>>>> stricter rules. Not sure if there is a job depending on it. > > >>>>>>>>>>>> > > >>>>>>>>>>>> * I removed the CVS plugin. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Everything else seem to working as usual. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Best, > > >>>>>>>>>>>> Olaf > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Am 30.10.2020 um 19:09 schrieb Olaf Flebbe <[email protected] > >: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I am doing an update of the machines in CI . Seems a couple > > >> of > > >>>>>>> security > > >>>>>>>>>>>> fixes are to be applied. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Olaf > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>> > > >>>> > > >>>> > > >> > > >
