I like Wade’s idea of splitting up the repository into several different repositories into logical modules.
This way you can easily track which modules have been okay’d for licenses, etc a lot more easily then you can with 1 big code base. Also does anyone need the entire codebase checked out? Maybe really only the CI build machine… Regards John > On 7 Oct 2016, at 11:42, Mark Struberg <[email protected]> wrote: > > Hi Emilian! > > The problem with 2 is that it won’t work nicely. > > There are 2 problems as sketched. > > a.) the repo contains binaries which are GPL licensed. That needs to get > kicked out of the repo anyway. > >> What is important is the legal clearance at >> the moment the code grant happens. > Yes, but Oracle can only grant stuff under ALv2 where they own the rights > themselves. They simply don’t own any rights for a hibernate.jar… > > Also the pure fact that it contains binaries at all is not really good. It’s > called source code management for a reason. > Not sure if the ant build already uses ivy. If not then we need to improve > this. > It also contains temporary build artifacts (well, unfortunately such things > happen…) > > > b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not even > be able to git-clone this over to their own github repos as those are limited > to 2GB. > So how should we get pull requests in that case? > > I agree with you that we should preserve the history though. > Thus the idea with moving over the original hg repo to some other place and > switch it into read-only mode. > And have the new GIT repo stripped down to the core parts (of course with > their history). > git-filter-branch is your friend. > > LieGrue, > strub > > >> Am 07.10.2016 um 11:37 schrieb Emilian Bold <[email protected]>: >> >> I vote for 2! >> >> I see no reason we should get rid of the history. >> >> The way I have read before, ASF does not need to have a legal clearance for >> every historical code revision. What is important is the legal clearance at >> the moment the code grant happens. >> >> I don't believe the GitHub 2GB limit is any indicator of anything except >> their capacity and business decision. The Linux kernel is close to 2GB, >> OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is >> 200MB, etc. >> >> NetBeans is project with over a decade of history with hundreds of people. >> The first commit is see is from 1999. >> >> Of course that such a large and old project will have a large repository! >> >> And as time passes each repository will only grow. I just read a >> StackOverflow answer on how to determine the GitHub repository size and >> their example for git/git mentioned it was 40MB -- it's, I believe, 200MB >> now. >> >> I also don't think 3) will result in much economy. I doubt there are many >> JARs or temporary build results. >> >> If the current repository turns out too much for the Apache Infra we could >> decide in time how to improve that, but as an Incubation goal I believe >> just switching to git should be enough. >> >> >> >> --emi >> >> On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <[email protected]> >> wrote: >> >>> Hi! >>> >>> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about >>> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a 2GB >>> limit). >>> I am currently hosting the repo on a small private server. >>> If anyone is interested then send me a private mail with your public key >>> and I’ll give you access. >>> Jaroslav, Geertjan and a few others already have a clone. >>> >>> There are basically 3 ways how we can handle this >>> >>> 1.) import a tarball into a fresh git repo. We would loose the history but >>> we only have sources which are explicitly cleared by Oracle. >>> >>> 2.) import the full hg history. That is pretty thick which means it’s not >>> that easy to clone. github pull requests also wont work as we exceed the >>> 2GB limit… >>> In addition the hg repo currently also contains lots of GPL libraries like >>> e.g. hibernate jar, etc. That’s something we don’t host at the ASF. >>> >>> 3.) Take the git import from hg and filter it. Remove all (most) jars, >>> temporary build results etc. We might also get rid of a few old branches >>> etc. If we keep the original hg repo around in read only mode then we >>> should be able to loose tons of weight. >>> >>> I personally prefer option 3. >>> But that is also the most labor intensive. >>> >>> >>> LieGrue, >>> strub >
