Thank you for the comment, Evans and Olaf! Following your advices, I
did the following:

* Removed unused EBS volumes (1TBx2 and 30GBx1).
* Replaced slaves (02, 03, 06, 07) with newly created EC2 instances.
Instance types were also upgraded (m3, m4 -> m5).
* Attached EBS volumes to the above instances. 200GBs to 02 and 03,
800GB to 06 and 07.
  (I said that I was going to separate two 2TB volumes into four
500GBs in the past, but slave 06 had used 660GB+ before replacing, so
I changed the allocation)

Then we're using the following resources now, in accordance with Evans' email:

* EC2 instances: one m3.xlarge (master), three m5.xlarge (slave 02, 03
and 07) and one m5.2xlarge (slave 06)
* EBS volumes: one 1TB gp2 (master), two 200GB gp2 (slave 02 and 03),
and two 800GB gp2 (slave 06 and 07)

New servers seem to work fine with Jenkins [1]. I also updated the
"Bigtop CI Setup Guide" page on cwiki [2].
Let me know if you find something wrong! :)

[1]: https://ci.bigtop.apache.org/computer/
[2]: 
https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+CI+Setup+Guide#BigtopCISetupGuide-SetupJenkinsslaves

Kengo Seki <[email protected]>

On Wed, Nov 11, 2020 at 4:59 AM Olaf Flebbe <[email protected]> wrote:
>
> hi,
>
> fully supporting evans:
> the unconnected disk do not contain anything valuable, please remove. it 
> might make sense to even recreate the current disks on ssd, a bit larger as 
> before if needed.
>
> olaf
>
> > Am 10.11.2020 um 08:09 schrieb Evans Ye <[email protected]>:
> >
> > Yes I think overall your plan is good.
> > What's the purpose of leveraging EBS snapshot? Is it to backup the things
> > we have before migration?
> > Except for the master node(have jenkins settings stored on disk), all those
> > slaves can be wiped out directly.
> >
> >
> >
> > Kengo Seki <[email protected]> 於 2020年11月10日 週二 下午2:42寫道:
> >
> >> Thanks everyone for the information! Now I understand our circumstances.
> >> So we're going to split two 1TB volumes attached to slave06 and 07
> >> into four 500GB volumes (and change their type to gp2), reattach them
> >> to 02, 03, 06 and 07, and remove currently unused two 1TB volumes,
> >> right?
> >>
> >>> Kengo would you like to take this, or you need a help?
> >>
> >> I think I can do them somehow (maybe using EBS snapshot?), but let me
> >> ask your help if I'm stuck. :)
> >>
> >> Kengo Seki <[email protected]>
> >>
> >> On Tue, Nov 10, 2020 at 1:00 AM Evans Ye <[email protected]> wrote:
> >>>
> >>> OK. I got it now.
> >>> So the newly created volumes are currently attached to slave06_2 and
> >>> slave07_2, respectively.
> >>> However, they're standard HDD, not GP2 SSD. I think we can take this
> >> chance
> >>> to recreate those 2 slaves and do an overhaul of our infrastructure.
> >>>
> >>> Kengo would you like to take this, or you need a help?
> >>>
> >>> Evans
> >>>
> >>> Olaf Flebbe <[email protected]> 於 2020年11月6日 週五 上午2:40寫道:
> >>>
> >>>> Hi,
> >>>>
> >>>> OMG . I think I did it.
> >>>>
> >>>> A few years ago two of the instance had a hardware problems and did not
> >>>> reboot any more, filesystem was corrupted and so on.  That was at the
> >> time
> >>>> of the spectre vulnarability discovery. (2018) . At that time AWS had
> >> major
> >>>> instabilities since updating firmware seem to have failed for some
> >> classes
> >>>> of hardware.
> >>>>
> >>>> I tried to recreate them as close as possible but I may have left
> >>>> accidentely the volumes around. Please lets delete them.
> >>>>
> >>>> Olaf
> >>>>
> >>>>> Am 05.11.2020 um 14:44 schrieb Konstantin Boudnik <[email protected]>:
> >>>>>
> >>>>> Thanks Evans!
> >>>>>
> >>>>> It's great you found the details: they are definitely accurate as I
> >> am
> >>>>> recalling now. Kengo, do you think splitting the volumes would help
> >> us
> >>>> for a
> >>>>> while? Or perhaps we shall try to expand the resource pool (which
> >> might
> >>>> take a
> >>>>> while)?
> >>>>>
> >>>>> Thanks!
> >>>>> Cos
> >>>>>
> >>>>> On Thu, Nov 05, 2020 at 12:32PM, Evans Ye wrote:
> >>>>>> In fact, the original deal of our resource is as follows:
> >>>>>>
> >>>>>>> 1 m3.2xlarge for CI
> >>>>>>> 4 m3.xlarge for CI and demo
> >>>>>>> 3 1TB EBS volumes
> >>>>>>> 5 elastic IP addresses
> >>>>>>
> >>>>>> So technically we should not use that 2 additional 1T volumes
> >> (created
> >>>> in
> >>>>>> 2018).
> >>>>>> Instead, I think what we can do is to split up one of the existing
> >> 1TB
> >>>>>> volumes(ex: attached to slave07) into smaller volumes for slave02,
> >> 03.
> >>>>>>
> >>>>>>
> >>>>>> Konstantin Boudnik <[email protected]> 於 2020年11月4日 週三 下午2:28寫道:
> >>>>>>
> >>>>>>> Kengo,
> >>>>>>>
> >>>>>>> We had an agreement with EMR folks that we are using the resources
> >>>>>>> available
> >>>>>>> to us and it is included into their budget (or something to this
> >>>> extent).
> >>>>>>> If
> >>>>>>> you see some of the resources available under our account - I
> >> don't see
> >>>>>>> why we
> >>>>>>> can't use them.
> >>>>>>>
> >>>>>>> If for whatever reason we need to expand the pool, that would
> >> require a
> >>>>>>> separate conversation with nice folks from that team, I imagine.
> >> Please
> >>>>>>> let me
> >>>>>>> know if I can help with this going forward.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> Cos
> >>>>>>>
> >>>>>>> On Wed, Nov 04, 2020 at 11:11AM, Kengo Seki wrote:
> >>>>>>>> Thanks for the comment, Cos! I was able to start docker service on
> >>>>>>>> docker-slave-02 without replacing and am running some Jenkins
> >> jobs on
> >>>>>>>> it now, so I'll replace it in the short future.
> >>>>>>>> I have a few things that I'd like to ask additionally:
> >>>>>>>>
> >>>>>>>> * docker-slave-02 and 03 have a gp2 storage as a root volume that
> >> has
> >>>>>>>> only 8GiB capacity, and they sometimes run short and stop the CI.
> >>>>>>>> May I increase them to 20 or 30 GiB when I replace those
> >> instances?
> >>>>>>>> (I'm not sure what is our budget)
> >>>>>>>>
> >>>>>>>> * They use an instance store with 30GiB to put docker images into
> >> it,
> >>>>>>>> and they also sometimes run short.
> >>>>>>>> It seems there are two unused volumes with 1TiB (vol-ae71114e and
> >>>>>>>> vol-4efa69ae) on AWS console.
> >>>>>>>> May I attach them to 02 and 03 instead of instance stores, or are
> >>>>>>>> they backups or something?
> >>>>>>>>
> >>>>>>>> Kengo Seki <[email protected]>
> >>>>>>>>
> >>>>>>>> On Mon, Nov 2, 2020 at 6:41 PM Konstantin Boudnik <[email protected]
> >>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I'd say let replace the broken one. I don't think there's a
> >>>> sentimental
> >>>>>>>>> value attached ;)
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> With regards,
> >>>>>>>>>  Cos
> >>>>>>>>>
> >>>>>>>>> On 02.11.2020 08:16, Kengo Seki wrote:
> >>>>>>>>>> Thanks for updating Olaf! I've just noticed the Jenkins UI
> >> became
> >>>>>>> cool :)
> >>>>>>>>>> Regarding docker-slave-02, I'll try to replace it after waiting
> >> for
> >>>> a
> >>>>>>>>>> while to make sure there's no objection.
> >>>>>>>>>>
> >>>>>>>>>> Kengo Seki <[email protected]>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Nov 2, 2020 at 1:39 PM Jun HE <[email protected]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks a lot for the update, Olaf!
> >>>>>>>>>>>
> >>>>>>>>>>> Olaf Flebbe <[email protected]> 于2020年10月31日周六 上午3:24写道:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> All machines patched. Jenkins and it plugins are updated:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Things to be noted:
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Slave 2 seems to be in serious problems. The disk image
> >> seems to
> >>>>>>> be
> >>>>>>>>>>>> corrupt, I would say:
> >>>>>>>>>>>> One of the problems: docker does not start any more.
> >>>>>>>>>>>> Is there anything important on it ? If yes please contact me.
> >> I
> >>>>>>> would
> >>>>>>>>>>>> recommend to set up slave2 from scratch again.
> >>>>>>>>>>>>
> >>>>>>>>>>>> * There was a warning regarding Copy Artifacts Plugin. It now
> >>>>>>> imposes
> >>>>>>>>>>>> stricter rules. Not sure if there is a job depending on it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> * I removed the CVS plugin.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Everything else seem to working as usual.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Olaf
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Am 30.10.2020 um 19:09 schrieb Olaf Flebbe <[email protected]>:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am doing an update of the machines in CI . Seems a couple
> >> of
> >>>>>>> security
> >>>>>>>>>>>> fixes are to be applied.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Olaf
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>>>
> >>
>

Reply via email to