Hi Sheng,

will I'm in general all in favour of widening the access to distribute the
tasks, the situation around the CI system in particular is a bit more
difficult.

As far as I know, the creation of the CI system is neither automated,
versioned nor backed up or safeguarded. This means that if somebody makes a
change that breaks something, we're left with a broken system we can't
recover from. Thus, I preferred it in the past to restrict the access as
much as possible (at least to Prod) to avoid these situations from
happening. While #1 and #2 are already possible today (we have two roles
for committers and regular users that allow this already), #3 and #4 come
with a significant risk for the stability of the system.

As soon as a job is added or changed, a lot of things happen in Jenkins -
one of these tasks is the SCM scan which tries to determine the branches
the job should run on. For somebody who is inexperienced, the first pitfall
is that suddenly hundreds of jobs are being spawned which will certainly
overload Jenkins and render it unusable. There are a lot of tricks and I
could elaborate them, but basically the bottom line is that the
configuration interface of Jenkins is far from fail-proof and exposes a
significant risk if accessed by somebody who doesn't exactly know what
they're doing - speak, we would need to design some kind of training and
even that would not safeguard us from these fatal events.

There's the whole security aspect around user-facing artifact generation of
CI/CD and the possibility of them being tampered, but I don't think I have
to elaborate that.

With regards to #4 especially, I'd say that the risk of somebody just
upgrading the system or changing plugins inherits an even bigger risk.
Plugins are notoriously unsafe and system updates have also shown to not
really go like a breeze. I'd argue that changes to the system should only
be done by the administrators of it since they have a bigger overview over
all the things that are currently going on while also having the full
access (backups before making changes, SSH access, log access, metric
access, etc) to debug errors. In the end we shouldn't forget that this is a
productive system - usually, you'd have nobody being able to touch it at
all, but we're not in a perfect world, so I'd say we should restrict it to
a bare minimum in the form of admins.

So while I certainly understand and encourage to distribute the access, I
don't feel comfortable widening the access to such a critical productive
system. It being down means that the GitHub development is fully halted,
which is really problematic since we don't have rollback mechanisms.

Best regards,
marco

On Sun, Sep 15, 2019 at 6:40 AM Sheng Zha <zhash...@apache.org> wrote:

> Hi,
>
> I'd like to initiate discussion on how access control should be managed
> for the CI system. The hope is that we can present the conclusion of this
> discussion as the recommendation and request to the donors of the CI system
> from Amazon.
>
> The specific aspects I'd like to discuss are the abilities to:
> 1. trigger PR validation and nightly jobs.
> 2. trigger continuous delivery jobs, such as for binary releases in pip,
> maven, and dockerhub.
> 3. add jobs to the CI system.
> 4. maintain and manage the CI system, such as system upgrades and jenkins
> plugin installation.
>
> Given that we already have GitHub SSO enabled on the Jenkins CI, I suggest
> the following authentication levels for these items:
> 1. all authenticated GitHub users.
> 2-4. all MXNet committers
>
> What do you think? If you have more aspects that you wish to discuss, feel
> free to propose.
>
> -sz
>

Reply via email to