We need to make some tests on the scalability of the events API because of: 1) need to monitor over 1000 repos (one call per repo ? one call for all ?) 2) by monitoring the entire jenkinsci org, 300 events could be not enough in case of catastrophic events
Working at webhook level ? I'll investigate further about the reliability / scalability of the API (on a series of *test* repo *OUTSIDE* the Jenkins CI organisation) Luca. On 13 Nov 2013, at 18:56, Kohsuke Kawaguchi <[email protected]> wrote: > > On 11/11/2013 11:05 PM, Luca Milanesio wrote: >> Seems a very good idea, it is basically a remote audit trail. >> >> The only concern is the throttling on the GitHub API: it would be better >> then to do the scripting on a local mirror of the GitHub repos. When you >> receive a forced update you do have anyway all the previous commits and >> the full reflog. > > With respect to throttling, the events API is designed for polling [1], so we > just need to poll the events for the entire jenkinsci org [2] and we'll have > the whole history. > > We already do an equivalent of local mirrors of the GitHub repos in > http://git.jenkins-ci.org/. The problem is that reflogs do not record remote > ref updates, so it will not protect against accidental ref manipulations. > > It does help however for the purpose of retaining commit objects, so we need > to keep this. > > >> However as you said by being triggered via web hook the number of API >> calls can be reduced to the minimum. >> >> I would submit a proposal to the Git mailing list of a "fetch by SHA1" >> which is a missing feature in Git IMHO. > > My recollection is that this was intentional for the security reason, so that > if a push is made accidentally and if it's removed, then those objects > shouldn't be accessible. > > I think what's useful and safe is to allow us to create a ref remotely on an > object that doesn't exist locally. Again, the transport level protocol allows > this, so it'd be nice to expose this. > >> Thanks to everyone including GitHub for the help and cooperation in >> getting this sorted out !! > > [1] http://developer.github.com/v3/activity/events/ > [2] https://api.github.com/orgs/jenkinsci/events > >> >> Luca >> --------- >> Sent from my iPhone >> Luca Milanesio >> Skype: lucamilanesio >> >> >> On 12 Nov 2013, at 06:25, Kohsuke Kawaguchi <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Now that the commits have been recovered and things are almost back to >>> normal, I think it's time to think about how to prevent this kind of >>> incidents in the future. >>> >>> Our open commit access policy was partly made possible by the idea >>> that any bad commits can be always rolled back. But where I failed to >>> think through was that the changes to refs aren't by themselves >>> version controlled, and so it is possible to lose commits by incorrect >>> ref manipulation, such as "git push -f", or by deleting a branch. >>> >>> I still feel strongly that we maintain the open commit access policy. >>> This is how we've been operating for the longest time, and it's also >>> because otherwise adding/removing developers to repositories would be >>> prohibitively tedious. >>> >>> So my proposal is to write a little program that uses GitHub events >>> API to keep track of push activities in our repositories. For every >>> update to a ref in the repository, we can record the timestamp, SHA1 >>> before and after, the user ID. We can maintain a text file for every >>> ref in every repository, and the program can append lines to it. In >>> other words, effectively recreate server-side reflog outside GitHub. >>> >>> The program should also fetch commits, so that it has a local copy for >>> every commit that ever landed on our repositories. Doing this also >>> allows the program to detect non fast-forward. It should warn us in >>> that situation, plus it will create a ref on the commit locally to >>> prevent it from getting lost. >>> >>> We can then make these repositories accessible via rsync to encourage >>> people to mirror them for backup, or we can make them publicly >>> accessible by hosting them on GitHub as well, although the latter >>> could be confusing. >>> >>> WIth a scheme like this, pushes can be safely recorded within a minute >>> or so (and this number can go down even further if we use webhooks.) >>> If a data loss occurs before the program gets to record newly pushed >>> commits, we should still be able to record who pushed afterward to >>> identify who has the commits that were lost. With such a small time >>> window between the push and the record, the number of such lost >>> commits should be low enough such that we can recover them manually. >>> >>> -- >>> >>> Kohsuke Kawaguchi >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Jenkins Developers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] >>> <mailto:[email protected]>. >>> For more options, visit https://groups.google.com/groups/opt_out. >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Jenkins Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send >> an email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. > > > -- > Kohsuke Kawaguchi | CloudBees, Inc. | http://cloudbees.com/ > Try Jenkins Enterprise, our professional version of Jenkins > > -- > You received this message because you are subscribed to the Google Groups > "Jenkins Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Jenkins Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
