Thank you for following through with this after we talked on IRC. I will check later the size reduction for the releases/ repo.
În Vin, 11 nov. 2016 la 07:45 Gregory Szorc <[email protected]> a scris: > I'm a Mercurial developer who is also responsible for running > https://hg.mozilla.org/ and supporting Mercurial at Mozilla. I understand > NetBeans is contemplating its version control future because the ASF only > supports Subversion and Git. I think I've learned some things that may be > helpful to you. > > First, the NetBeans "main" repo is on the same order of magnitude (but > marginally smaller than) the Firefox repository in terms of file count and > repository data size. So generally speaking, what I have learned supporting > Firefox can apply to NetBeans. > > While I understand Mercurial may not be in your future, I'd like to point > out that hg.netbeans.org is running a very old and very slow version of > Mercurial (likely a release from before July 2010). The high volume of > merge commits in the "main" repo contributes to highly sub-optimal storage > utilization in old versions of Mercurial. This makes clones and pulls > significantly slower due to more data to transfer and contributes to > significant CPU load on the server to read/encode the sub-optimal storage > encoding. I wouldn't be surprised if you have CPU load issues on the > server. > > As it is stored today, the "main" repository is almost exactly 3 GB. If you > create a new repository with optimal storage encoding using Mercurial 3.7 > or newer so "generaldelta" is the default storage format and configuring > the repository to recalculate optimal deltas, the repository size drops to > ~1.1 GB. This can be done as such: > > $ hg init main-optimal > $ cd main-optimal > $ hg --config format.generaldelta=true --config > format.aggressivemergedeltas=true pull https://hg.netbeans.org/main > <wait a long time> > > Now, for your VCS future. > > I'm a huge proponent of monorepos for productivity reasons. I've seen > discussion on this list about splitting the repo. I would discourage that. > I'd encourage you to read https://danluu.com/monorepo/ and the linked > articles at the bottom for more on the topic. > > Unfortunately, one of the practical concerns about monorepos is they don't > scale with some version control tools, namely Git. This leads many to let > deficiencies in tools drive workflow decisions, which is quite unfortunate > because tools should enhance productivity, not hinder it. If NetBeans uses > Git and maintains the "main" repo as is, I believe you'll experience the > following performance issues now or in the future as the repository keeps > growing: > > * You'll constantly be dealing with CPU explosions on the Git server > generated from clients performing clones and large pulls. GitHub uses a > server infrastructure that caches certain operations related to packfiles > to help mitigate this. I'm not sure the state of ASF's Git server. > > * In many cases, shallow clones can require more CPU on the Git server to > process than full clones. This is because the server essentially has to > read objects from packs and repack things instead of doing a fastpath that > effectively streams a packfile to a client. > > * Garbage collection could be problematic on the server and client > > Now, Git is constantly improving, so these problems may not always > exist.And as much as GitHub does well scaling well - better than a vanilla > Git install - it isn't a silver bullet. On a few instances, processes at > Mozilla have overwhelmed GitHub and resulted in GitHub disabling access to > repositories! That hasn't happened in a while though (partially through > them scaling better and partially through us learning our lesson and not > pointing hundreds of machines at large Git repos). I'm not sure what if > anything ASF's Git server has done to mitigate load from large > repositories. > > It's worth nothing that while some of the server-side CPU issues exist in > default Mercurial installations, there are mitigations. The "clonebundles" > extension allows a server to advertise pre-generated "bundle" files of > repository content. When a client clones, they download a large bundle from > a static file server then go back to the Mercurial server and get the data > changed since the bundle was created. If you `hg clone > https://hg.mozilla.org/mozilla-unified` > <https://hg.mozilla.org/mozilla-unified> with a modern Mercurial client, > your client will grab a 1+ GB file from a CDN and our servers will spend > maybe 5s of total CPU to service the clone. The clones are faster for > clients and the server can scale clones to nearly infinitely. It is wins > all around. > > Anyway, Mercurial's ability to scale doesn't help you if your choices are > Subversion or Git :/ > > Given those choices, I would lean towards Subversion if you want to > maintain the "main" repo as is. If you use the "main" repo as is with Git, > you should really do due diligence with the Git server operator to make > sure they won't be overwhelmed. > > If you split the "main" repo, go with Git if your users prefer Git over > Subversion. > > A compromise option would be to keep everything in a monorepo in Subversion > and have separate Git repositories for specific subdirectories or "views." > This is often a win-win but requires a bit of tooling to do the syncing. > Speaking of syncing, it should be unidirectional: bi-directional syncing of > anything is a hard problem and take my word from someone who has hacked on > bi-directional VCS syncing that it is not something you want to support. > Instead, I recommend abstracting the process of "pushing to the canonical > repo" to something a machine does and have it perform the VCS conversion to > the canonical repo and do the actual push. e.g. landing something from Git > would have a server fetch that Git ref and replay the commits as Subversion > commits (or squash and commit to preserve atomicity). > > Anyway, I think this wall of text is long enough. Reply if you have any > questions. > > Gregory >
