Re: Commit loss prevention

Kohsuke Kawaguchi Thu, 14 Nov 2013 10:11:54 -0800

On 11/14/2013 09:54 AM, domi wrote:

I think this was an exception and we should treat it as a such…

Yes, I agree. And we were able to recover all the commits after all, soI don't think we need to throw the baby out with the bath water.

Sure this could happen again but by doing some backups we should be
fine. Maybe we would better ask GH why they provide the feature to
block forced pushes just in there enterprise solution.

Yes, we will ask about this feature. But even if GitHub disables forcedpush, it's still not enough to prevent accidental or malicious data loss.

For example, if you look at a similar incident that happened a few yearsago in Eclipse [1], I bet these happened by mass deletion, not forcedupdates. (thanks Dariusz for this pointer!)

What I think we want GitHub to consider is the equivalent of "HistoryProtection" Darius wrote as implemented in CollabNet.

But until that comes, I guess we are on our own to emulate that withoutdirect access to the server.



[1] https://bugs.eclipse.org/bugs/show_bug.cgi?id=361707

/Domi

On 14.11.2013, at 18:50, Kohsuke Kawaguchi <[email protected]> wrote:



Hmm, I don't fully understand the Maven implication of such a setup, but 
there's a whole lot more to switching canonical repositories from one location 
to another than mass-updating pom.xml, such as communicating, infra managing, 
pull requests, access control and backup, that I'm pretty certain it's not as 
easy as you make it sound...

And I'm not yet sensing the appetite in the community for moving away from 
GitHub.


On 11/12/2013 02:16 AM, Stephen Connolly wrote:

I think part of the issue is that our canonical repositories are on
github...

I would favour jenkins-ci.org <http://jenkins-ci.org> being masters of
its own destiny... hence I would recommend hosting canonical repos on
project owned hardware and using GIT as a mirror of those canonical
repositories... much like the way ASF uses GIT. That would allow us to
implement policies such as preventing forced push to specific branches,
etc...

Of course that would be another pom.xml <scm> update change, namely the
<developerConnection> would point to the canonical repo while the
<connection> would point to the github repo... (with some use of
http://developer.github.com/v3/users/keys/#list-public-keys-for-a-user
we should be able to let users just register their keys in github)

e.g. the <scm> details would look like:

  <scm>
    <connection>scm:git:git://github.com/jenkinsci/[plugin
<http://github.com/jenkinsci/[plugin> name]-plugin.git</connection>
    <developerConnection>scm:git:git.jenkins-ci.org:jenkinsci/[plugin
name]-plugin.git</developerConnection>
    <url>http://github.com/jenkinsci/[plugin name]-plugin</url>
  </scm>

Maven will then do the "right thing" for pushing releases *even if you
checkout from github*... and we just have the canonical repos force push
to github and put proper permission sets on the canonical repos... most
developers will thus see no effective difference :-)


On 12 November 2013 06:25, Kohsuke Kawaguchi <[email protected]
<mailto:[email protected]>> wrote:

   Now that the commits have been recovered and things are almost back
   to normal, I think it's time to think about how to prevent this kind
   of incidents in the future.

   Our open commit access policy was partly made possible by the idea
   that any bad commits can be always rolled back. But where I failed
   to think through was that the changes to refs aren't by themselves
   version controlled, and so it is possible to lose commits by
   incorrect ref manipulation, such as "git push -f", or by deleting a
   branch.

   I still feel strongly that we maintain the open commit access
   policy. This is how we've been operating for the longest time, and
   it's also because otherwise adding/removing developers to
   repositories would be prohibitively tedious.

   So my proposal is to write a little program that uses GitHub events
   API to keep track of push activities in our repositories. For every
   update to a ref in the repository, we can record the timestamp, SHA1
   before and after, the user ID. We can maintain a text file for every
   ref in every repository, and the program can append lines to it. In
   other words, effectively recreate server-side reflog outside GitHub.

   The program should also fetch commits, so that it has a local copy
   for every commit that ever landed on our repositories. Doing this
   also allows the program to detect non fast-forward. It should warn
   us in that situation, plus it will create a ref on the commit
   locally to prevent it from getting lost.

   We can then make these repositories accessible via rsync to
   encourage people to mirror them for backup, or we can make them
   publicly accessible by hosting them on GitHub as well, although the
   latter could be confusing.

   WIth a scheme like this, pushes can be safely recorded within a
   minute or so (and this number can go down even further if we use
   webhooks.) If a data loss occurs before the program gets to record
   newly pushed commits, we should still be able to record who pushed
   afterward to identify who has the commits that were lost. With such
   a small time window between the push and the record, the number of
   such lost commits should be low enough such that we can recover them
   manually.

   --

   Kohsuke Kawaguchi

   --
   You received this message because you are subscribed to the Google
   Groups "Jenkins Developers" group.
   To unsubscribe from this group and stop receiving emails from it,
   send an email to [email protected]
   <mailto:jenkinsci-dev%[email protected]>.
   For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google
Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.



--
Kohsuke Kawaguchi | CloudBees, Inc. | http://cloudbees.com/
Try Jenkins Enterprise, our professional version of Jenkins

--
You received this message because you are subscribed to the Google Groups "Jenkins 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.



--
Kohsuke Kawaguchi | CloudBees, Inc. | http://cloudbees.com/
Try Jenkins Enterprise, our professional version of Jenkins

--
You received this message because you are subscribed to the Google Groups "Jenkins 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Commit loss prevention

Reply via email to