I think the issue should not be keeping the history or not... I think
clearly the BEST thing is to keep the history. Even if you never touch it,
you still need the history for attribution at least!

The issue is: things are too big. They are not big because Git can't handle
big things, but maybe too big because it was centralized and run as a
single thing in a single company. Maybe it makes sense to start separating
things a bit, to make easier for others to join in!

NetBeans is a HUGE codebase, but, it is also very modular! I'm sure we
could separate things in their own repositories and that alone would make
things easier to others to contribute, and even reuse in other Apache
projects.

So, instead of discussing the size or the download, wouldn't be a more
valid discussion to see if there is a reasonably way to split NetBeans into
a (small) set of meaningful repositories?

Git has the concept of "submodules" and also "subtrees". There is even a
"sub-repo" command[1] that improves on both ideas. Any of those would allow
to include "sub-repositories" inside a main repository. So, lets say we
divided NetBeans on the Java "package" level, we could still have a
"NetBeans" repository, that would reference the whole codebase as a single
"thing", but most of the project would actually be handled in the
sub-projects.

Can this be a doable option?

Cheers!
Bruno.

[1] https://github.com/ingydotnet/git-subrepo#readme

Bruno.
______________________________________________________________________
Bruno Peres Ferreira de Souza                         Brazil's JavaMan
http://www.javaman.com.br                      bruno at javaman.com.br
     if I fail, if I succeed, at least I live as I believe


On Fri, Oct 14, 2016 at 9:55 AM, Wade Chandler <cons...@wadechandler.com>
wrote:

>
> > On Oct 14, 2016, at 07:06, Emilian Bold <e...@apache.org> wrote:
> >
> > Hello,
> >
> > I've recently learned git allows 'shallow' clones that may contain no
> > history whatsoever.
> >
> > See the git clone manual <https://git-scm.com/docs/git-clone>,
> specifically
> > the --depth parameter.
> >
> > Obviously this will be a huge bandwidth, time and disk saver for some
> > people.
> >
>
> I agree shallow git clones are great. I think I would use them even with
> smaller repos until I needed to know more.
>
> > And it seems that git even supports push / pull from shallow
> repositories.
> >
> > I believe this would permit us to still use a single unaltered repository
> > while allowing users (or GitHub mirrors) to be shallow.
> >
>
> Yes, but then the whole is much larger still. The repository is 1GB just
> for the sources. If I’m working on Groovy, Java, and Core, then I don’t
> need PHP, C/C++, or others, and frankly they are out of context in that
> case. I think perhaps as a start we look at how to get moved over, but of
> course have to be able to put it in the infra regardless of thoughts on
> this, and then figure out something. i.e. it isn’t scalable IMO that
> everyone working on every technology has to contribute and merge up with
> everyone else working on other technologies unless they are actually
> changing some central thing.
>
> > PS: Philosophically speaking, I see all this discussion about repository
> > size and history stripping as a failure of DVCS
> > <https://en.wikipedia.org/wiki/Distributed_version_control>s and/or of
> the
> > Internet infrastructure. Removing history is the equivalent of removing
> > comments to save disk space.
>
> I don’t think that last statement is necessarily accurate. I mean, if a
> file has so many changes those old depths are irrelevant and useless, then
> what meaning do they have? It is hard to make a case they are useful after
> some time. To me it is like keeping too much stuff in the house because we
> are afraid to get rid of it. If you will never touch it, does it have any
> meaning? You might keep something, and some time down the road you go “Man,
> if I had that I could have made 10,000 bucks!”, but then if you had sold
> off old stuff and saved the money as you went through life, you probably
> would have had more money instantly available. But, the rare times you had
> that 10,000 dollar time laying around were probably so rare you can’t
> remember them or never had them. Maybe a bad analogy, but I think there is
> still a point when history is just stale, and even if slightly useful, not
> much due to the complication of its relevance to “now” at any point in
> time; the bigger the depth of a files history, the bigger the complexity
> between depth N and depth 1IMO.
>
> On the DVCS stuff, I don’t know. It is like the “cloud”. Smaller things
> just scale better until not only disk space but bandwidth gets cheaper and
> more available. Even in large networks like AWS smaller drives scale better
> for problems where as bigger ones don’t because you are dealing with so
> many connections and data pools. Even if we were using SVN, then if we
> depended on pulling down all C++, Python, PHP, Java, Groovy, etc just to
> work on say JavaScript, and if those things made the pull over 1GB, I think
> the same problem would exist, and personally I don’t find it practical. So,
> I see it as a problem of structure versus as much a problem with the
> technology…at least until we have quantum SSDs and quantum entanglement
> driven networks :-D
>
> Wade

Reply via email to