Re: Update

Evans Ye Sat, 21 Nov 2020 09:51:10 -0800

Looks awesome! Thank you Kengo. I think this is an important change that
can further support dev efficiency and enable us to release early,  release
often.


Kengo Seki <[email protected]> 於 2020年11月19日 週四 上午8:23寫道：

> Thank you for the comment, Evans and Olaf! Following your advices, I
> did the following:
>
> * Removed unused EBS volumes (1TBx2 and 30GBx1).
> * Replaced slaves (02, 03, 06, 07) with newly created EC2 instances.
> Instance types were also upgraded (m3, m4 -> m5).
> * Attached EBS volumes to the above instances. 200GBs to 02 and 03,
> 800GB to 06 and 07.
>   (I said that I was going to separate two 2TB volumes into four
> 500GBs in the past, but slave 06 had used 660GB+ before replacing, so
> I changed the allocation)
>
> Then we're using the following resources now, in accordance with Evans'
> email:
>
> * EC2 instances: one m3.xlarge (master), three m5.xlarge (slave 02, 03
> and 07) and one m5.2xlarge (slave 06)
> * EBS volumes: one 1TB gp2 (master), two 200GB gp2 (slave 02 and 03),
> and two 800GB gp2 (slave 06 and 07)
>
> New servers seem to work fine with Jenkins [1]. I also updated the
> "Bigtop CI Setup Guide" page on cwiki [2].
> Let me know if you find something wrong! :)
>
> [1]: https://ci.bigtop.apache.org/computer/
> [2]:
> https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+CI+Setup+Guide#BigtopCISetupGuide-SetupJenkinsslaves
>
> Kengo Seki <[email protected]>
>
> On Wed, Nov 11, 2020 at 4:59 AM Olaf Flebbe <[email protected]> wrote:
> >
> > hi,
> >
> > fully supporting evans:
> > the unconnected disk do not contain anything valuable, please remove. it
> might make sense to even recreate the current disks on ssd, a bit larger as
> before if needed.
> >
> > olaf
> >
> > > Am 10.11.2020 um 08:09 schrieb Evans Ye <[email protected]>:
> > >
> > > Yes I think overall your plan is good.
> > > What's the purpose of leveraging EBS snapshot? Is it to backup the
> things
> > > we have before migration?
> > > Except for the master node(have jenkins settings stored on disk), all
> those
> > > slaves can be wiped out directly.
> > >
> > >
> > >
> > > Kengo Seki <[email protected]> 於 2020年11月10日 週二 下午2:42寫道：
> > >
> > >> Thanks everyone for the information! Now I understand our
> circumstances.
> > >> So we're going to split two 1TB volumes attached to slave06 and 07
> > >> into four 500GB volumes (and change their type to gp2), reattach them
> > >> to 02, 03, 06 and 07, and remove currently unused two 1TB volumes,
> > >> right?
> > >>
> > >>> Kengo would you like to take this, or you need a help?
> > >>
> > >> I think I can do them somehow (maybe using EBS snapshot?), but let me
> > >> ask your help if I'm stuck. :)
> > >>
> > >> Kengo Seki <[email protected]>
> > >>
> > >> On Tue, Nov 10, 2020 at 1:00 AM Evans Ye <[email protected]> wrote:
> > >>>
> > >>> OK. I got it now.
> > >>> So the newly created volumes are currently attached to slave06_2 and
> > >>> slave07_2, respectively.
> > >>> However, they're standard HDD, not GP2 SSD. I think we can take this
> > >> chance
> > >>> to recreate those 2 slaves and do an overhaul of our infrastructure.
> > >>>
> > >>> Kengo would you like to take this, or you need a help?
> > >>>
> > >>> Evans
> > >>>
> > >>> Olaf Flebbe <[email protected]> 於 2020年11月6日 週五 上午2:40寫道：
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> OMG . I think I did it.
> > >>>>
> > >>>> A few years ago two of the instance had a hardware problems and did
> not
> > >>>> reboot any more, filesystem was corrupted and so on.  That was at
> the
> > >> time
> > >>>> of the spectre vulnarability discovery. (2018) . At that time AWS
> had
> > >> major
> > >>>> instabilities since updating firmware seem to have failed for some
> > >> classes
> > >>>> of hardware.
> > >>>>
> > >>>> I tried to recreate them as close as possible but I may have left
> > >>>> accidentely the volumes around. Please lets delete them.
> > >>>>
> > >>>> Olaf
> > >>>>
> > >>>>> Am 05.11.2020 um 14:44 schrieb Konstantin Boudnik <[email protected]
> >:
> > >>>>>
> > >>>>> Thanks Evans!
> > >>>>>
> > >>>>> It's great you found the details: they are definitely accurate as I
> > >> am
> > >>>>> recalling now. Kengo, do you think splitting the volumes would help
> > >> us
> > >>>> for a
> > >>>>> while? Or perhaps we shall try to expand the resource pool (which
> > >> might
> > >>>> take a
> > >>>>> while)?
> > >>>>>
> > >>>>> Thanks!
> > >>>>> Cos
> > >>>>>
> > >>>>> On Thu, Nov 05, 2020 at 12:32PM, Evans Ye wrote:
> > >>>>>> In fact, the original deal of our resource is as follows:
> > >>>>>>
> > >>>>>>> 1 m3.2xlarge for CI
> > >>>>>>> 4 m3.xlarge for CI and demo
> > >>>>>>> 3 1TB EBS volumes
> > >>>>>>> 5 elastic IP addresses
> > >>>>>>
> > >>>>>> So technically we should not use that 2 additional 1T volumes
> > >> (created
> > >>>> in
> > >>>>>> 2018).
> > >>>>>> Instead, I think what we can do is to split up one of the existing
> > >> 1TB
> > >>>>>> volumes(ex: attached to slave07) into smaller volumes for slave02,
> > >> 03.
> > >>>>>>
> > >>>>>>
> > >>>>>> Konstantin Boudnik <[email protected]> 於 2020年11月4日 週三 下午2:28寫道：
> > >>>>>>
> > >>>>>>> Kengo,
> > >>>>>>>
> > >>>>>>> We had an agreement with EMR folks that we are using the
> resources
> > >>>>>>> available
> > >>>>>>> to us and it is included into their budget (or something to this
> > >>>> extent).
> > >>>>>>> If
> > >>>>>>> you see some of the resources available under our account - I
> > >> don't see
> > >>>>>>> why we
> > >>>>>>> can't use them.
> > >>>>>>>
> > >>>>>>> If for whatever reason we need to expand the pool, that would
> > >> require a
> > >>>>>>> separate conversation with nice folks from that team, I imagine.
> > >> Please
> > >>>>>>> let me
> > >>>>>>> know if I can help with this going forward.
> > >>>>>>>
> > >>>>>>> Thanks!
> > >>>>>>> Cos
> > >>>>>>>
> > >>>>>>> On Wed, Nov 04, 2020 at 11:11AM, Kengo Seki wrote:
> > >>>>>>>> Thanks for the comment, Cos! I was able to start docker service
> on
> > >>>>>>>> docker-slave-02 without replacing and am running some Jenkins
> > >> jobs on
> > >>>>>>>> it now, so I'll replace it in the short future.
> > >>>>>>>> I have a few things that I'd like to ask additionally:
> > >>>>>>>>
> > >>>>>>>> * docker-slave-02 and 03 have a gp2 storage as a root volume
> that
> > >> has
> > >>>>>>>> only 8GiB capacity, and they sometimes run short and stop the
> CI.
> > >>>>>>>> May I increase them to 20 or 30 GiB when I replace those
> > >> instances?
> > >>>>>>>> (I'm not sure what is our budget)
> > >>>>>>>>
> > >>>>>>>> * They use an instance store with 30GiB to put docker images
> into
> > >> it,
> > >>>>>>>> and they also sometimes run short.
> > >>>>>>>> It seems there are two unused volumes with 1TiB (vol-ae71114e
> and
> > >>>>>>>> vol-4efa69ae) on AWS console.
> > >>>>>>>> May I attach them to 02 and 03 instead of instance stores, or
> are
> > >>>>>>>> they backups or something?
> > >>>>>>>>
> > >>>>>>>> Kengo Seki <[email protected]>
> > >>>>>>>>
> > >>>>>>>> On Mon, Nov 2, 2020 at 6:41 PM Konstantin Boudnik <
> [email protected]
> > >>>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> I'd say let replace the broken one. I don't think there's a
> > >>>> sentimental
> > >>>>>>>>> value attached ;)
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> With regards,
> > >>>>>>>>>  Cos
> > >>>>>>>>>
> > >>>>>>>>> On 02.11.2020 08:16, Kengo Seki wrote:
> > >>>>>>>>>> Thanks for updating Olaf! I've just noticed the Jenkins UI
> > >> became
> > >>>>>>> cool :)
> > >>>>>>>>>> Regarding docker-slave-02, I'll try to replace it after
> waiting
> > >> for
> > >>>> a
> > >>>>>>>>>> while to make sure there's no objection.
> > >>>>>>>>>>
> > >>>>>>>>>> Kengo Seki <[email protected]>
> > >>>>>>>>>>
> > >>>>>>>>>> On Mon, Nov 2, 2020 at 1:39 PM Jun HE <[email protected]>
> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks a lot for the update, Olaf!
> > >>>>>>>>>>>
> > >>>>>>>>>>> Olaf Flebbe <[email protected]> 于2020年10月31日周六 上午3:24写道：
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> All machines patched. Jenkins and it plugins are updated:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Things to be noted:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> * Slave 2 seems to be in serious problems. The disk image
> > >> seems to
> > >>>>>>> be
> > >>>>>>>>>>>> corrupt, I would say:
> > >>>>>>>>>>>> One of the problems: docker does not start any more.
> > >>>>>>>>>>>> Is there anything important on it ? If yes please contact
> me.
> > >> I
> > >>>>>>> would
> > >>>>>>>>>>>> recommend to set up slave2 from scratch again.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> * There was a warning regarding Copy Artifacts Plugin. It
> now
> > >>>>>>> imposes
> > >>>>>>>>>>>> stricter rules. Not sure if there is a job depending on it.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> * I removed the CVS plugin.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Everything else seem to working as usual.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Olaf
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Am 30.10.2020 um 19:09 schrieb Olaf Flebbe <[email protected]
> >:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I am doing an update of the machines in CI . Seems a couple
> > >> of
> > >>>>>>> security
> > >>>>>>>>>>>> fixes are to be applied.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Olaf
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >>>>
> > >>>>
> > >>
> >
>

Re: Update

Reply via email to