I like Wade’s idea of splitting up the repository into several different 
repositories into logical modules.

This way you can easily track which modules have been okay’d for licenses, etc 
a lot more easily then you can with 1 big code base.  

Also does anyone need the entire codebase checked out?  Maybe really only the 
CI build machine…  

Regards

John



> On 7 Oct 2016, at 11:42, Mark Struberg <[email protected]> wrote:
> 
> Hi Emilian!
> 
> The problem with 2 is that it won’t work nicely.
> 
> There are 2 problems as sketched.
> 
> a.) the repo contains binaries which are GPL licensed. That needs to get 
> kicked out of the repo anyway.
> 
>> What is important is the legal clearance at
>> the moment the code grant happens.
> Yes, but Oracle can only grant stuff under ALv2 where they own the rights 
> themselves. They simply don’t own any rights for a hibernate.jar…
> 
> Also the pure fact that it contains binaries at all is not really good. It’s 
> called source code management for a reason.
> Not sure if the ant build already uses ivy. If not then we need to improve 
> this.
> It also contains temporary build artifacts (well, unfortunately such things 
> happen…)
> 
> 
> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not even 
> be able to git-clone this over to their own github repos as those are limited 
> to 2GB.
> So how should we get pull requests in that case?
> 
> I agree with you that we should preserve the history though. 
> Thus the idea with moving over the original hg repo to some other place and 
> switch it into read-only mode.
> And have the new GIT repo stripped down to the core parts (of course with 
> their history). 
> git-filter-branch is your friend.
> 
> LieGrue,
> strub
> 
> 
>> Am 07.10.2016 um 11:37 schrieb Emilian Bold <[email protected]>:
>> 
>> I vote for 2!
>> 
>> I see no reason we should get rid of the history.
>> 
>> The way I have read before, ASF does not need to have a legal clearance for
>> every historical code revision. What is important is the legal clearance at
>> the moment the code grant happens.
>> 
>> I don't believe the GitHub 2GB limit is any indicator of anything except
>> their capacity and business decision. The Linux kernel is close to 2GB,
>> OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is
>> 200MB, etc.
>> 
>> NetBeans is project with over a decade of history with hundreds of people.
>> The first commit is see is from 1999.
>> 
>> Of course that such a large and old project will have a large repository!
>> 
>> And as time passes each repository will only grow. I just read a
>> StackOverflow answer on how to determine the GitHub repository size and
>> their example for git/git mentioned it was 40MB -- it's, I believe, 200MB
>> now.
>> 
>> I also don't think 3) will result in much economy. I doubt there are many
>> JARs or temporary build results.
>> 
>> If the current repository turns out too much for the Apache Infra we could
>> decide in time how to improve that, but as an Incubation goal I believe
>> just switching to git should be enough.
>> 
>> 
>> 
>> --emi
>> 
>> On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <[email protected]>
>> wrote:
>> 
>>> Hi!
>>> 
>>> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
>>> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a 2GB
>>> limit).
>>> I am currently hosting the repo on a small private server.
>>> If anyone is interested then send me a private mail with your public key
>>> and I’ll give you access.
>>> Jaroslav, Geertjan and a few others already have a clone.
>>> 
>>> There are basically 3 ways how we can handle this
>>> 
>>> 1.) import a tarball into a fresh git repo. We would loose the history but
>>> we only have sources which are explicitly cleared by Oracle.
>>> 
>>> 2.) import the full hg history. That is pretty thick which means it’s not
>>> that easy to clone. github pull requests also wont work as we exceed the
>>> 2GB limit…
>>> In addition the hg repo currently also contains lots of GPL libraries like
>>> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
>>> 
>>> 3.) Take the git import from hg and filter it. Remove all (most) jars,
>>> temporary build results etc. We might also get rid of a few old branches
>>> etc. If we keep the original hg repo around in read only mode then we
>>> should be able to loose tons of weight.
>>> 
>>> I personally prefer option 3.
>>> But that is also the most labor intensive.
>>> 
>>> 
>>> LieGrue,
>>> strub
> 

Reply via email to