On 06 Feb 2016, at 01:05, Junio C Hamano <gits...@pobox.com> wrote:

> Stefan Beller <sbel...@google.com> writes:
> 
>> Currently when cloning a project, including submodules, the --depth argument
>> is passed on recursively, i.e. when cloning with "--depth 2", both the
>> superproject as well as the submodule will have a depth of 2.  It is not
>> garantueed that the commits as specified by the superproject are included
>> in these 2 commits of the submodule.
>> 
>> Illustration:
>> (superproject with depth 2, so A would have more parents, not shown)
>> 
>> superproject/master: A <- B
>>                    /      \
>> submodule/master:  C <- D <- E <- F <- G
>> 
>> (Current behavior is to fetch G and F)
> 
> I think the issue is deeper than merely "--depth 2", and you would
> be better off stepping back and think about various use cases to
> make sure that we know what kind of behaviour we want to support
> before delving into one particular corner case.  We currently pass
> the depth recursively, and I do not think it makes much sense, but I
> view it as a secondary question "among the behaviours we want to
> support, which one should be the default?"  It may turn out that not
> passing it recursively at all, or even passing a different depth, is
> a better default, but we wouldn't know until we know what are the
> desirable behaviour in various workflows.
> 
> If you are actively working on the superproject plus some submodules
> but you are merely using the submodule you depicted above, not
> working on changing it, even when you want the full history of the
> superproject (i.e. no "--depth 2"), you may not want history of the
> submodule.  Even though we have a way to say "I am not interested in
> this submodule AT ALL" by not doing "submodule init", not having
> anything at all at the path submodule/ may not allow you to build
> the whole thing, and we currently lack a way to express "I am not
> interested in the history of this thing, but I need at least the
> tree that matches the commit referred to by the superproject".
> 
> If you are working on a single submodule, trying to fix a bug in the
> context of the whole project, you might want to have a single-depth
> clone of the superproject and all other submodules, plus the whole
> history of the single submodule.
> 
> In either of these examples, the top-level "--depth" does not have
> much to do with what depth the user wants to use when cloning or
> fetching the submodule repositories.
> 
> I have a feeling (but I would not be surprised if somebody who uses
> submodules heavily has a counter-example from real life) that
> regardless of "--depth" or full clone, fetching the tip of matching
> branch is not a good default behaviour.  In your picture, even when
> depth is not given at all, there isn't much point fetching F or G.

I really wonder in what cases people use the "--depth" option, too. 
For instance I have never used it in either one of the two cases you
described above. I don't worry about a long running "clone" as it 
usually is a one-time operation.

However, in case of a continuous integration system that starts with
a clean state in the beginning of every run (e.g. Travis CI) a
"clone" operation is no one-time operation anymore. In this case the
"--depth 1" option makes very much sense to me. This was the situation
where I realized the problem that Stefan wants to tackle here and I
tried to make it tangible with a test case [1]. 

On top of that I think Git's error message is really confusing if
you clone a repo with "--depth" that has submodules and Git is not
fetching the necessary submodule commits:

Unable to checkout '$SHA' in submodule path 'path/to/submodule'

I tried to tackle that with [2] which would detect this case and
print the following error instead (slightly changed from the patch):

Unable to checkout '$SHA' in submodule path '/path/to/commit'.
Try to remove the '--depth' argument on clone!

[1] https://www.mail-archive.com/git%40vger.kernel.org/msg82614.html
[2] https://www.mail-archive.com/git%40vger.kernel.org/msg82613.html


> 
>> So to fetch the correct submodule commits, we need to
>> * traverse the superproject and list all submodule commits.
>> * fetch these submodule commits (C and E) by sha1
> 
> I do not think requiring that C to be fetched when the superproject
> is cloned with --depth=2 (hence A and B are present in the result)
> is a good definition of "correct submodule commits".  The initial
> clone could be "superproject follows --depth, all submodules are
> cloned with --depth=1 at the commits referenced by the superproject
> tree"--by that definition, you need E but you do not want C.
> 
> As a specification of the behaviour, the above two might work, but I
> do not think that should be the implementation.  In other words,
> "The implementation should behave as if it did the above two" is OK,
> and it is also OK to qualify with further conditions to help the
> implementation.  For example, the current structure assumes that E
> and C are reachable from "some" ref in submodule, so that at least a
> whole clone of the submodule would give them to you--otherwise you
> would not be able to even build the superproject at A or B.  Perhaps
> it is OK to further require that, when you are working in a single
> branch mode and working on 'master', you are required to have
> commits C and E reachable on the 'master' branch in the submodule,
> and that may lets you limit the need for such scanning of the
> history?
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to