Hi Mark,

On Fri, Oct 7, 2016 at 12:42 PM, Mark Struberg <[email protected]>
wrote:

> Hi Emilian!
>
> The problem with 2 is that it won’t work nicely.
>
> There are 2 problems as sketched.
>
> a.) the repo contains binaries which are GPL licensed. That needs to get
> kicked out of the repo anyway.
>
> > What is important is the legal clearance at
> > the moment the code grant happens.
> Yes, but Oracle can only grant stuff under ALv2 where they own the rights
> themselves. They simply don’t own any rights for a hibernate.jar…
>
> Also the pure fact that it contains binaries at all is not really good.
> It’s called source code management for a reason.
>

Inside one of my NetBeans clones, in .hg/store, I did:
---
$ for extension in zip jar class exe dll o; do echo $extension; find .
-type f -name "*\.$extension*" -print0 | xargs --null du --apparent-size
-sch | grep total; echo; done
zip
38M     total

jar
60M     total

class
190K    total

exe
15M     total

dll
16M     total

o
1,4M    total
1,2M    total
---

This may miss some binaries due to file name encoding (and maybe I did
something wrong in the experiment), but I would not expect gains
significantly bigger than this. Also, please note that not all historical
.jar/.zip files are actual binaries. Also, not sure if exe and dll are
problematic - I believe currently a build made on Linux can run on Windows,
and it may be problematic to achieve this without having the exe/dll
precompiled.


> Not sure if the ant build already uses ivy. If not then we need to improve
> this.
>

NetBeans (currently) uses this:
http://wiki.netbeans.org/ExternalBinaries

Also note the historical:
http://wiki.netbeans.org/HgExternalBinaries

The latter explains the .jar/.zip files which are not actual binaries.

Also, I believe the official repos have a push hook in place which prevents
pushing too big binary files with certain extensions into certain
directories:
http://hg.netbeans.org/nb-hooks/file/dfd2d386149f/forbid_external.py

It also contains temporary build artifacts (well, unfortunately such things
> happen…)
>
>
> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
> even be able to git-clone this over to their own github repos as those are
> limited to 2GB.
> So how should we get pull requests in that case?
>
> I agree with you that we should preserve the history though.
> Thus the idea with moving over the original hg repo to some other place
> and switch it into read-only mode.
> And have the new GIT repo stripped down to the core parts (of course with
> their history).
>

I guess the question is what do you consider a core part. I think it would
be OK to not keep history in the "main" repo for modules that were placed
into separate repositories, like:
http://hg.netbeans.org/community-visualweb/
(as far as I can tell, the history is kept in the split up repositories.)
But is e.g. the Java support a core part? C/C++ support? PHP support?

For me personally, having history is very important - and having it inside
my IDE (not in some other repository) is also important. For example, doing
a change to CasualDiff without history could be quite painful:
http://hg.netbeans.org/jet-main/annotate/437d7ca35923/java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java

Jan

PS: if someone would want to clone Mark's converted repository, this is a
location of a mirror, that can be used currently:
git clone http://lahoda.info/netbeans-import.git/

(it is not permanent, but should be good for now)

git-filter-branch is your friend.
>
> LieGrue,
> strub
>
>
> > Am 07.10.2016 um 11:37 schrieb Emilian Bold <[email protected]>:
> >
> > I vote for 2!
> >
> > I see no reason we should get rid of the history.
> >
> > The way I have read before, ASF does not need to have a legal clearance
> for
> > every historical code revision. What is important is the legal clearance
> at
> > the moment the code grant happens.
> >
> > I don't believe the GitHub 2GB limit is any indicator of anything except
> > their capacity and business decision. The Linux kernel is close to 2GB,
> > OpenOffice is 1.5GB, Hadoop is 400MB, Lucene-Solr is 200MB, JMeter is
> > 200MB, etc.
> >
> > NetBeans is project with over a decade of history with hundreds of
> people.
> > The first commit is see is from 1999.
> >
> > Of course that such a large and old project will have a large repository!
> >
> > And as time passes each repository will only grow. I just read a
> > StackOverflow answer on how to determine the GitHub repository size and
> > their example for git/git mentioned it was 40MB -- it's, I believe, 200MB
> > now.
> >
> > I also don't think 3) will result in much economy. I doubt there are many
> > JARs or temporary build results.
> >
> > If the current repository turns out too much for the Apache Infra we
> could
> > decide in time how to improve that, but as an Incubation goal I believe
> > just switching to git should be enough.
> >
> >
> >
> > --emi
> >
> > On Fri, Oct 7, 2016 at 12:16 AM, Mark Struberg <[email protected]
> >
> > wrote:
> >
> >> Hi!
> >>
> >> I’ve migrated the NetBeans hg repo into GIT. Sadly this repo takes about
> >> 3.6 GiB and thus we cannot host it on github or Bitbucket (both have a
> 2GB
> >> limit).
> >> I am currently hosting the repo on a small private server.
> >> If anyone is interested then send me a private mail with your public key
> >> and I’ll give you access.
> >> Jaroslav, Geertjan and a few others already have a clone.
> >>
> >> There are basically 3 ways how we can handle this
> >>
> >> 1.) import a tarball into a fresh git repo. We would loose the history
> but
> >> we only have sources which are explicitly cleared by Oracle.
> >>
> >> 2.) import the full hg history. That is pretty thick which means it’s
> not
> >> that easy to clone. github pull requests also wont work as we exceed the
> >> 2GB limit…
> >> In addition the hg repo currently also contains lots of GPL libraries
> like
> >> e.g. hibernate jar, etc. That’s something we don’t host at the ASF.
> >>
> >> 3.) Take the git import from hg and filter it. Remove all (most) jars,
> >> temporary build results etc. We might also get rid of a few old branches
> >> etc. If we keep the original hg repo around in read only mode then we
> >> should be able to loose tons of weight.
> >>
> >> I personally prefer option 3.
> >> But that is also the most labor intensive.
> >>
> >>
> >> LieGrue,
> >> strub
>
>

Reply via email to