[jira] [Comment Edited] (MESOS-8162) Binary data causes bloat in the git repository

Andrew Schwartzmeyer (JIRA) Thu, 02 Nov 2017 11:17:19 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236311#comment-16236311
 ]


Andrew Schwartzmeyer edited comment on MESOS-8162 at 11/2/17 6:16 PM:
----------------------------------------------------------------------

I share the common opinion on rewriting a repo's history:

> but it just feels disruptive.

This is an understatement. Every developer, for every copy of the repo they 
have (remote and local forks) have to carefully replace their repo, and 
transition over in-progress work that wasn't yet upstream. There _will_ be 
information loss, such as local reflogs and stashes.

Rewriting a repo's history is generally very ill-advised, because of the amount 
of work it causes for every developer.

Just myself as an example: I would need to replace nine copies of the 
repository (four machines each with two worktrees plus GitHub), and I would be 
quite unhappy to lose my stashes and reflog and archived branches on all of 
those. I have twenty branches under archive/* on GitHub, that I keep for 
reference (or they're for testing or experiments etc.) that an upstream rewrite 
would force me to carefully graft onto the new repo, and ten active branches.

Now multiply this be every Mesos developer.

My point is: we should be careful about bloating the repo (and clearly haven't 
been, since hadoop is in the history), but we should very much avoid rewriting 
the history. We do not yet have a pathologically sized repository; I do not 
believe we gain much from rewriting it, and firmly believe we would cause an 
extraordinary amount of work for every Mesos developer.

Oh man, and I didn't even mention consumers of Mesos that might have copies of 
the repo.


was (Author: andschwa):
I share the common opinion on rewriting a repo's history:

> but it just feels disruptive.

This is an understatement. Every developer, for every copy of the repo they 
have (remote and local forks) have to carefully replace their repo, and 
transition over in-progress work that wasn't yet upstream. There _will_ be 
information loss, such as local reflogs and stashes.

Rewriting a repo's history is generally very ill-advised, because of the amount 
of work it causes for every developer.

Just myself as an example: I would need to replace nine copies of the 
repository (three machines each with two worktrees plus GitHub), and I would be 
quite unhappy to lose my stashes and reflog and archived branches on all of 
those. I have twenty branches under archive/* on GitHub, that I keep for 
reference (or they're for testing or experiments etc.) that an upstream rewrite 
would force me to carefully graft onto the new repo, and ten active branches.

Now multiply this be every Mesos developer.

My point is: we should be careful about bloating the repo (and clearly haven't 
been, since hadoop is in the history), but we should very much avoid rewriting 
the history. We do not yet have a pathologically sized repository; I do not 
believe we gain much from rewriting it, and firmly believe we would cause an 
extraordinary amount of work for every Mesos developer.

Oh man, and I didn't even mention consumers of Mesos that might have copies of 
the repo.

> Binary data causes bloat in the git repository
> ----------------------------------------------
>
>                 Key: MESOS-8162
>                 URL: https://issues.apache.org/jira/browse/MESOS-8162
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Michael Park
>
> Since Git doesn't know how to handle binary files all that well, the way in 
> which
> the {{3rdparty}} directory is managed continues to bloat the size of our 
> repository.
> There is a ~100M hadoop from a long time ago that's still stored, a few ~20M
> each of older versions of Zookeeper, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (MESOS-8162) Binary data causes bloat in the git repository

Reply via email to