Re: Tight submodule bindings

Junio C Hamano Mon, 13 Jan 2014 14:15:09 -0800

"W. Trevor King" <[email protected]> writes:

> Additional metadata, the initial checkout, and syncing down
> -----------------------------------------------------------
>
> However, folks who do local submodule development will care about
> which submodule commit is responsible for that tree, because that's
> going to be the base of their local development.  They also care about
> additional out-of-tree information, including the branch that commit
> is on.


Well, please step back a bit.

They do not have to care about what local branch they use to build
follow-up work based on that commit.  In fact, they would want to be
able to develop more than one histories on top, which means more
than one branches they can name themselves.

The only thing they care about is where the result of their
development _goes_, that is the URL and the branch of the remote
they are pushing back to.

I have a feeling that this is not specific for submodules---if you
did this:

        git init here
        cd here
        git fetch $there master
        git reset --hard FETCH_HEAD

and are given the resulting working tree to start hacking on, you
would not know where the history came from, or where your result
wants to go.  

So "the branch that commit is on" is a wrong thing to focus on.
"The branch the history built on top of the commit wants to go" may
be closer and these two are different.

>  For already-initialized submodules, there are existing places
> in the submodule config to store this configuration:
>
> 1. HEAD for the checked-out branch,
> 2. branch.<name>.remote → remote.<name>.url for the upstream
>    subproject URL,
> 4. branch.<name>.rebase (or pull.rebase) to prefer rebase over merge
>    for integration,
> 5. …

What happened to 3 ;-)?

And also branch.<name>.merge may say on which of _their_ branch the
commit you learn in the superproject tree would be found.  If you
are using centralized workflow, that would be the branch at your
central repository to update with your push, too.

In any case, "local-branch" is wrong from two aspects:

 1. (obvious) It does not follow our naming convention not to use
    dashed-names for configuration variables.

 2. You do not care about the names you use locally.  The only thing
    you care about is where people meet at the central repository,
    i.e. where your result is pushed to.


> Syncing up
> ----------
>
> In the previous section I explained how data should flow from
> .gitmodules into out-of-tree configs.

s/should/you think should/, I think, but another way may be not to
copy and read from there, which may be a lot simpler.  Then upon
switching branches of top-level superproject (which would update
.gitmodules to the version on the new branch), you may get different
settings automatically.  But see below.

> ...  Since you *will* want to share the upstream URL, I
> proposed using an explicit submodule.<name>.active setting to store
> the “do I care” information [2], instead of overloading
> submodule.<name>.url (I'd auto-sync the .gitmodule's
> submodule.<name>.url with the subproject's remote.origin.url unless
> the user opted out of .gitmodules syncing).

It may not be a good idea to blindly update to whatever happens to
be in .gitmodules, especially once submodule.*.url is initialized.

I think we would need a bit more sophisticated mechanism than "use
from .git/config if set, otherwise use from .gitmodules", at least
for the URL.  It may not be limited to the URL, and other pieces
of metainformation about submodules may need similar handling, but
I'd refrain from extending the scope of discussion needlessly at
this point.

Imagine that your embedded appliance project used to use a submodule
from git://k.org/linux-2.6 as its kernel component and now the
upstream of it is instead called just git://k.org/linux; the URL
specified by submodule.kernel.url in .gitmodules for the entry
submodule.kernel.path=kernel would have changed from the former to
the latter sometime in the superproject's history.  Switching back
to an old version in the superproject to fix an old bug in the
maintenance track of the superproject would still want to push
associated fixes to the kernel to k.org/linux, not linux-2.6, the
latter of which may now be defunct [*1*].  One way to make it work
semi-automatically is to keep track of what the user has seen in
.gitmodules and offer chances to update entries in .git/config.  If
you cloned the superproject recently, you would only know about the
new git://k.org/linux URL and that would be copied to .git/config
(which the current code does).  In addition, you would remember that
we saw git://k.org/linux URL (which the current code does not).
Upon switching back to an old version, we could notice that the URL
in .gitmodules, which is git://k.org/linux-2.6, is not something the
user has seen, and at that point we could ask the user to tell us
what URL should be used, record the answer _and_ the fact that we
saw that old URL as well.  Then until the superproject updates the
URL the next time to a value that we have never seen, the user can
keep using the right URL without being asked [*2*].


> 2. Checkout the new superproject branch.
>
>    2.1. For each old submodule that doesn't exist in the new branch,
>         blow away the submodule directory (assuming a new-style
>         .git/modules/… layout, and not an old-style submod/.git/…
>         layout).

Sure.

>    2.2. For each gitlinked submodule that didn't exist in the old
>         branch, setup the submodule as if you were doing the initial
>         cloning checkout (forcing a new local-branch to point at the
>         gitlinked commit).  If you find local out-of-tree
>         *superproject* configs that conflict with the .gitmodules
>         values, prefer the superproject configs.  Clobber submodule
>         configs and local branches at will (modulo
>         submodule.<name>.sync), because any submodule configs that the
>         user wanted to keep should have been added to the superproject
>         branch earlier (or stashed).

See above.


[Footnote]

*1* On the other hand, the switch of the submodule URL in the
superproject may have been between two separate projects (e.g. you
used to build your embedded appliance using BSD kernel but recent
versions use Linux kernel)---in such a project, you would want the
submodule URL to follow what is in .gitmodules when you switch
between old and new versions in the superproject.  But our
recommendation in such a case is to use different names for
submodules that is bound at the same path in the superproject so
that we can keep them as two separate repositories in .git/mdoules/
of the superproject.  So at least for the URL, there is no reason to
use the old version that appears in .gitmodules of the superproject
even when you checkout an old version of it.

*2* This "remembering" may have to be more than "have we seen this"
one-bit per different values. For URL, I think the one-bit is
enough, but for other things, it might make sense to keep track of
"In the version of superproject with X in .gitmodules, the user
wants to use value Y" for each values X the user has seen.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Tight submodule bindings

Reply via email to