> On Oct 7, 2016, at 10:23, Jan Lahoda <[email protected]> wrote: > > > This may miss some binaries due to file name encoding (and maybe I did > something wrong in the experiment), but I would not expect gains > significantly bigger than this.
The current source if one clones main-golden, be sure and perform a clean to remove build things, and exclude .hg, yields a size of ~1GB From inside the project directory: du -c -g -I .hg . Do the same thing to .hg, and you get ~4. This all of course expanded on the drive; mine a Mac OS X file system. > Also, not sure if exe and dll are > problematic - I believe currently a build made on Linux can run on Windows, > and it may be problematic to achieve this without having the exe/dll > precompiled. > I agree on the precompiled part, but those binaries can technically be packaged and put into a binary repository. As an example, in our local Artifactory, I have the Google Chrome Driver for Selenium, and then add it as a dependency to Gradle builds; works like a charm. So, there are ways to do that for the build which don’t impact the source repository. > >> Not sure if the ant build already uses ivy. If not then we need to improve >> this. >> > > NetBeans (currently) uses this: > http://wiki.netbeans.org/ExternalBinaries > <http://wiki.netbeans.org/ExternalBinaries> > > Also note the historical: > http://wiki.netbeans.org/HgExternalBinaries > <http://wiki.netbeans.org/HgExternalBinaries> > > The latter explains the .jar/.zip files which are not actual binaries. > > Also, I believe the official repos have a push hook in place which prevents > pushing too big binary files with certain extensions into certain > directories: > http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py > <http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py> > > It also contains temporary build artifacts (well, unfortunately such things >> happen…) >> >> >> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not >> even be able to git-clone this over to their own github repos as those are >> limited to 2GB. >> So how should we get pull requests in that case? >> >> I agree with you that we should preserve the history though. >> Thus the idea with moving over the original hg repo to some other place >> and switch it into read-only mode. >> And have the new GIT repo stripped down to the core parts (of course with >> their history). >> > > I guess the question is what do you consider a core part. I think it would > be OK to not keep history in the "main" repo for modules that were placed > into separate repositories, like: > http://hg.netbeans.org/community-visualweb/ > <http://hg.netbeans.org/community-visualweb/> > (as far as I can tell, the history is kept in the split up repositories.) > But is e.g. the Java support a core part? C/C++ support? PHP support? > The core IDE and even the main build can be split from the other specific feature support, regardless of what is said to be in the core, and even then, some repos may not “stand alone” for the sub-components/modules IMO with regard to the the main build as a sibling or parent; this gets into git sub-modules as a way to break up the size as well as what one has to checkout to work on a specific piece of functionality. So, the basic pieces to just get the platform up and running, and the build going, is 1 big sub-set to me. Then, Java specific support could be another. It could split on SE and EE though. Next, PHP, that seems independent. HTML/Web/etc…another. Groovy stands out as another. I’m sure that can keep going. Then, if the repos were essentially: netbeans-core (however it is decided) netbeans-java netbeans-javaee netbeans-groovy netbeans-php netbeans-c netbeans-nodejs (or what ever the name is) …etc Then that seems manageable to me. There could then be netbeans-main which, with a little restructuring, has all those which make up the NetBeans release as an entire structure with git sub-modules. It could be it breaks different. It could be like this netbeans-main (has what we would call core in it…along with current build.xml and nbbuild…this would allow the current build system to keep working without any or much rework I think) netbeans-java netbeans-javaee … etc etc and netbeans-main has git sub-modules for all of what is the “NetBeans release”. This would take some work undoubtedly. Too, the history can be interesting in these cases. > For me personally, having history is very important - and having it inside > my IDE (not in some other repository) is also important. For example, doing > a change to CasualDiff without history could be quite painful: > http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java > > <http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java> > Given history and regressions I can see that, but how would a change be more painful without the history? I’m asking for the specific context which we are referring to. For instance, I can see a depth of history being maintained as helpful for most use cases, but beyond that, unless some obscure regression is hit, then the most dated history doesn’t seem to come into play too often beyond a certain depth; other than just historical reasons or to see “who” worked on something. The code is what it is at some point in history. As an example, given some file which has a decade of history, if it has many changes, then it will have a big/deep depth, and that depth will nearly necessarily mean much to most of that file has changed, or certain parts have changed a lot, and thus been completely rewritten, and others not so much. The older history then becomes more of a liability if used for purposes other than identifying who did what; hard to sort through and have context. Older not relative to date per se but the depth and number of changes and iteration a file has gone through. Of course, all that said, if the larger git repo was supported in the infra, then users cloning the repository from git can use depth, and most won’t need all the history. Too, until one gets into the small hundreds of GBs, cloning and spanning from a “cloud” perspective isn’t really that big of a deal, so I agree this is a github business decision limitation more than anything else. If git did not support depth cloning, I would very much argue that at some point too much history has a lot of diminishing return as it becomes impractical for a community projects members (all of them) to clone many many GB of data. Data plans and connections are not the same world wide, and generally that history is more useful to see if a regression has been introduced and what exactly changed. The gamble being a certain depth “back” will not have the regressions, and the current code is fairly well vetted, so for most cases, taking a depth based starting point considering such deep history, should be “mostly” safe and manageable. Thanks, Wade
