@Bruno see the thread "[PROPOSAL] Split the main NetBeans repo" which
discusses splitting the repo per clusters. It's a pretty good idea and the
history loss is minimal. Probably the way we are going to go.


--emi

On Mon, Oct 17, 2016 at 10:08 PM, Bruno Souza <br...@javaman.com.br> wrote:

> I think the issue should not be keeping the history or not... I think
> clearly the BEST thing is to keep the history. Even if you never touch it,
> you still need the history for attribution at least!
>
> The issue is: things are too big. They are not big because Git can't handle
> big things, but maybe too big because it was centralized and run as a
> single thing in a single company. Maybe it makes sense to start separating
> things a bit, to make easier for others to join in!
>
> NetBeans is a HUGE codebase, but, it is also very modular! I'm sure we
> could separate things in their own repositories and that alone would make
> things easier to others to contribute, and even reuse in other Apache
> projects.
>
> So, instead of discussing the size or the download, wouldn't be a more
> valid discussion to see if there is a reasonably way to split NetBeans into
> a (small) set of meaningful repositories?
>
> Git has the concept of "submodules" and also "subtrees". There is even a
> "sub-repo" command[1] that improves on both ideas. Any of those would allow
> to include "sub-repositories" inside a main repository. So, lets say we
> divided NetBeans on the Java "package" level, we could still have a
> "NetBeans" repository, that would reference the whole codebase as a single
> "thing", but most of the project would actually be handled in the
> sub-projects.
>
> Can this be a doable option?
>
> Cheers!
> Bruno.
>
> [1] https://github.com/ingydotnet/git-subrepo#readme
>
> Bruno.
> ______________________________________________________________________
> Bruno Peres Ferreira de Souza                         Brazil's JavaMan
> http://www.javaman.com.br                      bruno at javaman.com.br
>      if I fail, if I succeed, at least I live as I believe
>
>
> On Fri, Oct 14, 2016 at 9:55 AM, Wade Chandler <cons...@wadechandler.com>
> wrote:
>
> >
> > > On Oct 14, 2016, at 07:06, Emilian Bold <e...@apache.org> wrote:
> > >
> > > Hello,
> > >
> > > I've recently learned git allows 'shallow' clones that may contain no
> > > history whatsoever.
> > >
> > > See the git clone manual <https://git-scm.com/docs/git-clone>,
> > specifically
> > > the --depth parameter.
> > >
> > > Obviously this will be a huge bandwidth, time and disk saver for some
> > > people.
> > >
> >
> > I agree shallow git clones are great. I think I would use them even with
> > smaller repos until I needed to know more.
> >
> > > And it seems that git even supports push / pull from shallow
> > repositories.
> > >
> > > I believe this would permit us to still use a single unaltered
> repository
> > > while allowing users (or GitHub mirrors) to be shallow.
> > >
> >
> > Yes, but then the whole is much larger still. The repository is 1GB just
> > for the sources. If I’m working on Groovy, Java, and Core, then I don’t
> > need PHP, C/C++, or others, and frankly they are out of context in that
> > case. I think perhaps as a start we look at how to get moved over, but of
> > course have to be able to put it in the infra regardless of thoughts on
> > this, and then figure out something. i.e. it isn’t scalable IMO that
> > everyone working on every technology has to contribute and merge up with
> > everyone else working on other technologies unless they are actually
> > changing some central thing.
> >
> > > PS: Philosophically speaking, I see all this discussion about
> repository
> > > size and history stripping as a failure of DVCS
> > > <https://en.wikipedia.org/wiki/Distributed_version_control>s and/or of
> > the
> > > Internet infrastructure. Removing history is the equivalent of removing
> > > comments to save disk space.
> >
> > I don’t think that last statement is necessarily accurate. I mean, if a
> > file has so many changes those old depths are irrelevant and useless,
> then
> > what meaning do they have? It is hard to make a case they are useful
> after
> > some time. To me it is like keeping too much stuff in the house because
> we
> > are afraid to get rid of it. If you will never touch it, does it have any
> > meaning? You might keep something, and some time down the road you go
> “Man,
> > if I had that I could have made 10,000 bucks!”, but then if you had sold
> > off old stuff and saved the money as you went through life, you probably
> > would have had more money instantly available. But, the rare times you
> had
> > that 10,000 dollar time laying around were probably so rare you can’t
> > remember them or never had them. Maybe a bad analogy, but I think there
> is
> > still a point when history is just stale, and even if slightly useful,
> not
> > much due to the complication of its relevance to “now” at any point in
> > time; the bigger the depth of a files history, the bigger the complexity
> > between depth N and depth 1IMO.
> >
> > On the DVCS stuff, I don’t know. It is like the “cloud”. Smaller things
> > just scale better until not only disk space but bandwidth gets cheaper
> and
> > more available. Even in large networks like AWS smaller drives scale
> better
> > for problems where as bigger ones don’t because you are dealing with so
> > many connections and data pools. Even if we were using SVN, then if we
> > depended on pulling down all C++, Python, PHP, Java, Groovy, etc just to
> > work on say JavaScript, and if those things made the pull over 1GB, I
> think
> > the same problem would exist, and personally I don’t find it practical.
> So,
> > I see it as a problem of structure versus as much a problem with the
> > technology…at least until we have quantum SSDs and quantum entanglement
> > driven networks :-D
> >
> > Wade
>

Reply via email to