Hi Mark, On Fri, Oct 7, 2016 at 12:42 PM, Mark Struberg <[email protected]> wrote:
> Hi Emilian! > > The problem with 2 is that it won’t work nicely. > > There are 2 problems as sketched. > > a.) the repo contains binaries which are GPL licensed. That needs to get > kicked out of the repo anyway. > > > What is important is the legal clearance at > > the moment the code grant happens. > Yes, but Oracle can only grant stuff under ALv2 where they own the rights > themselves. They simply don’t own any rights for a hibernate.jar… > > Also the pure fact that it contains binaries at all is not really good. > It’s called source code management for a reason. > Inside one of my NetBeans clones, in .hg/store, I did: --- $ for extension in zip jar class exe dll o; do echo $extension; find . -type f -name "*\.$extension*" -print0 | xargs --null du --apparent-size -sch | grep total; echo; done zip 38M total jar 60M total class 190K total exe 15M total dll 16M total o 1,4M total 1,2M total --- This may miss some binaries due to file name encoding (and maybe I did something wrong in the experiment), but I would not expect gains significantly bigger than this. Also, please note that not all historical .jar/.zip files are actual binaries. Also, not sure if exe and dll are problematic - I believe currently a build made on Linux can run on Windows, and it may be problematic to achieve this without having the exe/dll precompiled. > Not sure if the ant build already uses ivy. If not then we need to improve > this. > NetBeans (currently) uses this: http://wiki.netbeans.org/ExternalBinaries Also note the historical: http://wiki.netbeans.org/HgExternalBinaries The latter explains the .jar/.zip files which are not actual binaries. Also, I believe the official repos have a push hook in place which prevents pushing too big binary files with certain extensions into certain directories: http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py It also contains temporary build artifacts (well, unfortunately such things > happen…) > > > b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not > even be able to git-clone this over to their own github repos as those are > limited to 2GB. > So how should we get pull requests in that case? > > I agree with you that we should preserve the history though. > Thus the idea with moving over the original hg repo to some other place > and switch it into read-only mode. > And have the new GIT repo stripped down to the core parts (of course with > their history). > I guess the question is what do you consider a core part. I think it would be OK to not keep history in the "main" repo for modules that were placed into separate repositories, like: http://hg.netbeans.org/community-visualweb/ (as far as I can tell, the history is kept in the split up repositories.) But is e.g. the Java support a core part? C/C++ support? PHP support? For me personally, having history is very important - and having it inside my IDE (not in some other repository) is also important. For example, doing a change to CasualDiff without history could be quite painful: http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java Jan PS: if someone would want to clone Mark's converted repository, this is a location of a mirror, that can be used currently: git clone http://lahoda.info/netbeans-import.git/ (it is not permanent, but should be good for now) git-filter-branch is your friend. > > LieGrue, > strub > > > > Am 07.10.2016 um 11:37 schrieb Emilian Bold <[email protected]>: > > > > I vote for 2! > > > > I see no reason we should get rid of the history. > > > > The way I have read before, ASF does not need to have a legal clearance > for > > every historical code revision. What is important is the legal clearance > at > > the moment the code grant happens. > > > > I don't believe the GitHub 2GB limit is any indicator of anything except > > their capacity and business decision. The Linux kernel is close to 2GB, > > OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is > > 200MB, etc. > > > > NetBeans is project with over a decade of history with hundreds of > people. > > The first commit is see is from 1999. > > > > Of course that such a large and old project will have a large repository! > > > > And as time passes each repository will only grow. I just read a > > StackOverflow answer on how to determine the GitHub repository size and > > their example for git/git mentioned it was 40MB -- it's, I believe, 200MB > > now. > > > > I also don't think 3) will result in much economy. I doubt there are many > > JARs or temporary build results. > > > > If the current repository turns out too much for the Apache Infra we > could > > decide in time how to improve that, but as an Incubation goal I believe > > just switching to git should be enough. > > > > > > > > --emi > > > > On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <[email protected] > > > > wrote: > > > >> Hi! > >> > >> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about > >> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a > 2GB > >> limit). > >> I am currently hosting the repo on a small private server. > >> If anyone is interested then send me a private mail with your public key > >> and I’ll give you access. > >> Jaroslav, Geertjan and a few others already have a clone. > >> > >> There are basically 3 ways how we can handle this > >> > >> 1.) import a tarball into a fresh git repo. We would loose the history > but > >> we only have sources which are explicitly cleared by Oracle. > >> > >> 2.) import the full hg history. That is pretty thick which means it’s > not > >> that easy to clone. github pull requests also wont work as we exceed the > >> 2GB limit… > >> In addition the hg repo currently also contains lots of GPL libraries > like > >> e.g. hibernate jar, etc. That’s something we don’t host at the ASF. > >> > >> 3.) Take the git import from hg and filter it. Remove all (most) jars, > >> temporary build results etc. We might also get rid of a few old branches > >> etc. If we keep the original hg repo around in read only mode then we > >> should be able to loose tons of weight. > >> > >> I personally prefer option 3. > >> But that is also the most labor intensive. > >> > >> > >> LieGrue, > >> strub > >
