Hi Sheng, will I'm in general all in favour of widening the access to distribute the tasks, the situation around the CI system in particular is a bit more difficult.
As far as I know, the creation of the CI system is neither automated, versioned nor backed up or safeguarded. This means that if somebody makes a change that breaks something, we're left with a broken system we can't recover from. Thus, I preferred it in the past to restrict the access as much as possible (at least to Prod) to avoid these situations from happening. While #1 and #2 are already possible today (we have two roles for committers and regular users that allow this already), #3 and #4 come with a significant risk for the stability of the system. As soon as a job is added or changed, a lot of things happen in Jenkins - one of these tasks is the SCM scan which tries to determine the branches the job should run on. For somebody who is inexperienced, the first pitfall is that suddenly hundreds of jobs are being spawned which will certainly overload Jenkins and render it unusable. There are a lot of tricks and I could elaborate them, but basically the bottom line is that the configuration interface of Jenkins is far from fail-proof and exposes a significant risk if accessed by somebody who doesn't exactly know what they're doing - speak, we would need to design some kind of training and even that would not safeguard us from these fatal events. There's the whole security aspect around user-facing artifact generation of CI/CD and the possibility of them being tampered, but I don't think I have to elaborate that. With regards to #4 especially, I'd say that the risk of somebody just upgrading the system or changing plugins inherits an even bigger risk. Plugins are notoriously unsafe and system updates have also shown to not really go like a breeze. I'd argue that changes to the system should only be done by the administrators of it since they have a bigger overview over all the things that are currently going on while also having the full access (backups before making changes, SSH access, log access, metric access, etc) to debug errors. In the end we shouldn't forget that this is a productive system - usually, you'd have nobody being able to touch it at all, but we're not in a perfect world, so I'd say we should restrict it to a bare minimum in the form of admins. So while I certainly understand and encourage to distribute the access, I don't feel comfortable widening the access to such a critical productive system. It being down means that the GitHub development is fully halted, which is really problematic since we don't have rollback mechanisms. Best regards, marco On Sun, Sep 15, 2019 at 6:40 AM Sheng Zha <zhash...@apache.org> wrote: > Hi, > > I'd like to initiate discussion on how access control should be managed > for the CI system. The hope is that we can present the conclusion of this > discussion as the recommendation and request to the donors of the CI system > from Amazon. > > The specific aspects I'd like to discuss are the abilities to: > 1. trigger PR validation and nightly jobs. > 2. trigger continuous delivery jobs, such as for binary releases in pip, > maven, and dockerhub. > 3. add jobs to the CI system. > 4. maintain and manage the CI system, such as system upgrades and jenkins > plugin installation. > > Given that we already have GitHub SSO enabled on the Jenkins CI, I suggest > the following authentication levels for these items: > 1. all authenticated GitHub users. > 2-4. all MXNet committers > > What do you think? If you have more aspects that you wish to discuss, feel > free to propose. > > -sz >