On Fri, Oct 7, 2016 at 5:23 PM, Wade Chandler <[email protected]>
wrote:

>
> > On Oct 7, 2016, at 10:23, Jan Lahoda <[email protected]> wrote:
> >
>

[snip]


> >> b.) the repo size is about 3.6 GiB. That’s really huge. Devs would not
> >> even be able to git-clone this over to their own github repos as those
> are
> >> limited to 2GB.
> >> So how should we get pull requests in that case?
> >>
> >> I agree with you that we should preserve the history though.
> >> Thus the idea with moving over the original hg repo to some other place
> >> and switch it into read-only mode.
> >> And have the new GIT repo stripped down to the core parts (of course
> with
> >> their history).
> >>
> >
> > I guess the question is what do you consider a core part. I think it
> would
> > be OK to not keep history in the "main" repo for modules that were placed
> > into separate repositories, like:
> > http://hg.netbeans.org/community-visualweb/ <http://hg.netbeans.org/
> community-visualweb/>
> > (as far as I can tell, the history is kept in the split up repositories.)
> > But is e.g. the Java support a core part? C/C++ support? PHP support?
> >
>
> The core IDE and even the main build can be split from the other specific
> feature support, regardless of what is said to be in the core, and even
> then, some repos may not “stand alone” for the sub-components/modules IMO
> with regard to the the main build as a sibling or parent; this gets into
> git sub-modules as a way to break up the size as well as what one has to
> checkout to work on a specific piece of functionality. So, the basic pieces
> to just get the platform up and running, and the build going, is 1 big
> sub-set to me. Then, Java specific support could be another. It could split
> on SE and EE though. Next, PHP, that seems independent.
> HTML/Web/etc…another. Groovy stands out as another. I’m sure that can keep
> going. Then, if the repos were essentially:
>
> netbeans-core (however it is decided)
> netbeans-java
> netbeans-javaee
> netbeans-groovy
> netbeans-php
> netbeans-c
> netbeans-nodejs (or what ever the name is)
> …etc
>
> Then that seems manageable to me. There could then be netbeans-main which,
> with a little restructuring, has all those which make up the NetBeans
> release as an entire structure with git sub-modules. It could be it breaks
> different. It could be like this
> netbeans-main (has what we would call core in it…along with current
> build.xml and nbbuild…this would allow the current build system to keep
> working without any or much rework I think)
> netbeans-java
> netbeans-javaee
> … etc etc
>
> and netbeans-main has git sub-modules for all of what is the “NetBeans
> release”. This would take some work undoubtedly. Too, the history can be
> interesting in these cases.
>

That's principally doable, of course. One obvious (long-discussed) way to
split the repository is based on clusters (although even that might be
tricky). But someone needs to actually do the work, and adjusting the build
system may or may not be simple.

One possibility would be to use Module Suites for the clusters (at least
for clusters other than platform). The things that would need to be solved
in that case are test dependencies: I suspect test-to-test dependencies
among Module Suites may not work (and surely cannot work when compiling
against binary clusters, as binary clusters don't currently have tests).
And a lot of tests depend on tests in openide.util.lookup. (Also, not sure
if qa-functional tests are supported in Module Suites, would need to check.)


> > For me personally, having history is very important - and having it
> inside
> > my IDE (not in some other repository) is also important. For example,
> doing
> > a change to CasualDiff without history could be quite painful:
> > http://hg.netbeans.org/jet-main/annotate/437d7ca35923/
> java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java
> <http://hg.netbeans.org/jet-main/annotate/437d7ca35923/
> java.source.base/src/org/netbeans/modules/java/source/save/CasualDiff.java
> >
> >
>
> Given history and regressions I can see that, but how would a change be
> more painful without the history? I’m asking for the specific context which
> we are referring to. For instance, I can see a depth of history being
> maintained as helpful for most use cases, but beyond that, unless some
> obscure regression is hit, then the most dated history doesn’t seem to come
> into play too often beyond a certain depth; other than just historical
> reasons or to see “who” worked on something. The code is what it is at some
> point in history. As an example, given some file which has a decade of
> history, if it has many changes, then it will have a big/deep depth, and
> that depth will nearly necessarily mean much to most of that file has
> changed, or certain parts have changed a lot, and thus been completely
> rewritten, and others not so much. The older history then becomes more of a
> liability if used for purposes other than identifying who did what; hard to
> sort through and have context. Older not relative to date per se but the
> depth and number of changes and iteration a file has gone through.
>

Well, taking the CausalDiff example: you may be debugging a problem where
the IDE generates too many spaces, and you'll find an if statement that is
causing that. By looking into the history, one can find the usecase for
which the statement was added (so one does not break it), and also the
tests that were introduced to test the behavior (so that in the first phase
of fixing, one can only run a sub-set of tests, not all of them).

Some of this may be available by running tests (esp. for things like
CasualDiff, where there is quite a few tests), but that takes time, while
looking at the history is quite fast.

Jan


> Of course, all that said, if the larger git repo was supported in the
> infra, then users cloning the repository from git can use depth, and most
> won’t need all the history. Too, until one gets into the small hundreds of
> GBs, cloning and spanning from a “cloud” perspective isn’t really that big
> of a deal, so I agree this is a github business decision limitation more
> than anything else.
>
> If git did not support depth cloning, I would very much argue that at some
> point too much history has a lot of diminishing return as it becomes
> impractical for a community projects members (all of them) to clone many
> many GB of data. Data plans and connections are not the same world wide,
> and generally that history is more useful to see if a regression has been
> introduced and what exactly changed. The gamble being a certain depth
> “back” will not have the regressions, and the current code is fairly well
> vetted, so for most cases, taking a depth based starting point considering
> such deep history, should be “mostly” safe and manageable.
>
> Thanks,
>
> Wade

Reply via email to