> On Oct 7, 2016, at 10:23, Jan Lahoda <[email protected]> wrote:
> 
> 
> This may miss some binaries due to file name encoding (and maybe I did
> something wrong in the experiment), but I would not expect gains
> significantly bigger than this.

The current source if one clones main-golden, be sure and perform a clean to 
remove build things, and exclude .hg, yields a size of ~1GB

From inside the project directory:
du -c -g -I .hg .

Do the same thing to .hg, and you get ~4. This all of course expanded on the 
drive; mine a Mac OS X file system.

> Also, not sure if exe and dll are
> problematic - I believe currently a build made on Linux can run on Windows,
> and it may be problematic to achieve this without having the exe/dll
> precompiled.
> 

I agree on the precompiled part, but those binaries can technically be packaged 
and put into a binary repository. As an example, in our local Artifactory, I 
have the Google Chrome Driver for Selenium, and then add it as a dependency to 
Gradle builds; works like a charm. So, there are ways to do that for the build 
which don’t impact the source repository.

> 
>> Not sure if the ant build already uses ivy. If not then we need to improve
>> this.
>> 
> 
> NetBeans (currently) uses this:
> http://wiki.netbeans.org/ExternalBinaries 
> <http://wiki.netbeans.org/ExternalBinaries>
> 
> Also note the historical:
> http://wiki.netbeans.org/HgExternalBinaries 
> <http://wiki.netbeans.org/HgExternalBinaries>
> 
> The latter explains the .jar/.zip files which are not actual binaries.
> 
> Also, I believe the official repos have a push hook in place which prevents
> pushing too big binary files with certain extensions into certain
> directories:
> http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py 
> <http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py>
> 
> It also contains temporary build artifacts (well, unfortunately such things
>> happen…)
>> 
>> 
>> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
>> even be able to git-clone this over to their own github repos as those are
>> limited to 2GB.
>> So how should we get pull requests in that case?
>> 
>> I agree with you that we should preserve the history though.
>> Thus the idea with moving over the original hg repo to some other place
>> and switch it into read-only mode.
>> And have the new GIT repo stripped down to the core parts (of course with
>> their history).
>> 
> 
> I guess the question is what do you consider a core part. I think it would
> be OK to not keep history in the "main" repo for modules that were placed
> into separate repositories, like:
> http://hg.netbeans.org/community-visualweb/ 
> <http://hg.netbeans.org/community-visualweb/>
> (as far as I can tell, the history is kept in the split up repositories.)
> But is e.g. the Java support a core part? C/C++ support? PHP support?
> 

The core IDE and even the main build can be split from the other specific 
feature support, regardless of what is said to be in the core, and even then, 
some repos may not “stand alone” for the sub-components/modules IMO with regard 
to the the main build as a sibling or parent; this gets into git sub-modules as 
a way to break up the size as well as what one has to checkout to work on a 
specific piece of functionality. So, the basic pieces to just get the platform 
up and running, and the build going, is 1 big sub-set to me. Then, Java 
specific support could be another. It could split on SE and EE though. Next, 
PHP, that seems independent. HTML/Web/etc…another. Groovy stands out as 
another. I’m sure that can keep going. Then, if the repos were essentially:

netbeans-core (however it is decided)
netbeans-java
netbeans-javaee
netbeans-groovy
netbeans-php
netbeans-c
netbeans-nodejs (or what ever the name is)
…etc

Then that seems manageable to me. There could then be netbeans-main which, with 
a little restructuring, has all those which make up the NetBeans release as an 
entire structure with git sub-modules. It could be it breaks different. It 
could be like this
netbeans-main (has what we would call core in it…along with current build.xml 
and nbbuild…this would allow the current build system to keep working without 
any or much rework I think)
netbeans-java
netbeans-javaee
… etc etc

and netbeans-main has git sub-modules for all of what is the “NetBeans 
release”. This would take some work undoubtedly. Too, the history can be 
interesting in these cases.

> For me personally, having history is very important - and having it inside
> my IDE (not in some other repository) is also important. For example, doing
> a change to CasualDiff without history could be quite painful:
> http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java
>  
> <http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java>
> 

Given history and regressions I can see that, but how would a change be more 
painful without the history? I’m asking for the specific context which we are 
referring to. For instance, I can see a depth of history being maintained as 
helpful for most use cases, but beyond that, unless some obscure regression is 
hit, then the most dated history doesn’t seem to come into play too often 
beyond a certain depth; other than just historical reasons or to see “who” 
worked on something. The code is what it is at some point in history. As an 
example, given some file which has a decade of history, if it has many changes, 
then it will have a big/deep depth, and that depth will nearly necessarily mean 
much to most of that file has changed, or certain parts have changed a lot, and 
thus been completely rewritten, and others not so much. The older history then 
becomes more of a liability if used for purposes other than identifying who did 
what; hard to sort through and have context. Older not relative to date per se 
but the depth and number of changes and iteration a file has gone through.

Of course, all that said, if the larger git repo was supported in the infra, 
then users cloning the repository from git can use depth, and most won’t need 
all the history. Too, until one gets into the small hundreds of GBs, cloning 
and spanning from a “cloud” perspective isn’t really that big of a deal, so I 
agree this is a github business decision limitation more than anything else.

If git did not support depth cloning, I would very much argue that at some 
point too much history has a lot of diminishing return as it becomes 
impractical for a community projects members (all of them) to clone many many 
GB of data. Data plans and connections are not the same world wide, and 
generally that history is more useful to see if a regression has been 
introduced and what exactly changed. The gamble being a certain depth “back” 
will not have the regressions, and the current code is fairly well vetted, so 
for most cases, taking a depth based starting point considering such deep 
history, should be “mostly” safe and manageable.

Thanks,

Wade

Reply via email to