On Tue, May 3, 2016 at 4:56 PM, Jonathan Nieder <jrnie...@gmail.com> wrote:
> Stefan Beller wrote:
>
>> This is similar to the gitignore document, but doesn't mirror
>> the current situation. It is rather meant to start a discussion for
>> the right approach for mirroring repositories with submodules.
>
> Ooh.

Thanks for writing such a detailed answer. :)

>
> [...]
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -59,6 +59,22 @@ instead of treating the other project as a submodule. 
>> Directories
>>  that come from both projects can be cloned and checked out as a whole
>>  if you choose to go that route.
>>
>> +Submodule operations can be configured using the following mechanisms
>> +(from highest to lowest precedence):
>> +
>> + * the command line for those commands that support taking submodule specs.
>> +
>> + * the configuration file `$GIT_DIR/config`.
>> +
>> + * the configuration file `config` found in the `refs/submodule/config` 
>> branch.
>> +   This can be used to overwrite the upstream configuration in the 
>> `.gitmodules`
>> +   file without changing the history of the project.
>> +   Useful options here are overwriting the base, where relative URLs apply 
>> to,
>> +   when mirroring only parts of the larger collection of submodules.
>> +
>> + * the `.gitmodules` file inside the repository. A project usually includes 
>> this
>> +   file to suggest defaults for the upstream collection of repositories.
>
> (This documentation probably belongs in gitmodules(5) --- then,
> git-submodule(1) could focus on command-line usage and point there for
> configuration information.)

That makes sense!

>
> There are two aspects of this to be separated: what governs the behavior
> of commands running locally, and where we get information about
> submodules from a remote repository.

After reading the first time, this seems to also contain "historical context".

>
> Local commands
> --------------
> The original submodule design was that local commands rely on
> information from .git/config, and that information gets copied there
> from .gitmodules when a submodule is initialized.  That way, a local
> user can specify their preferred mirror or other options using some
> straightforward 'git config' commands.
>
> As a side effect, the settings in .git/config tell git which submodules
> to pay attention to (which submodules were initialized).
>
> When .gitmodules changes, the settings in .git/config are left alone,
> since the end user *might* have manually set something up and we don't
> want to trample on it.
>
> This design is somewhat problematic for a few reasons:
>
> - When I want to stop paying attention to a particular submodule and
>   start paying attention to it again later, all my local settings are
>   gone.
>
> - When upstream adds a new submodule, I have to do the same manual
>   work to change the options for that new submodule.
>
> - When upstream changes submodule options (perhaps to fix a URL
>   typo), I don't get those updates.
>
> A fix is to use settings from .git/config when present and fall back
> to .gitmodules when not.  I believe the submodule code has been slowly
> moving in that direction for new features.  Perhaps we can do so for
> existing features (like submodule.*.url) too.
>
> An alternative would have been to introduce a .git/info/submodules
> file that overrides settings from .gitmodules, analagous to
> .git/info/excludes overriding .gitignore and .git/info/attributes
> overriding .gitattributes.  We are already using .git/config for
> this so that doesn't seem necessary.

I don't know if it is a worthwhile goal nevertheless to move
the information about submodules to .git/info/submodules eventually
as that brings consistency across different features of Git?

>
> Remote repositories
> -------------------
> The .gitmodules file has some odd properties as a place to put
> configuration:
>
> - it is versioned.  There is no way to change URLs in an old version
>   of .gitmodules retroactively when a URL has changed.

I would not call it odd for having one versioned place. Consider your
build process is updated and the new build process produces new
intermediate files. You would add these files to the .gitignore file
eventually, but when building old revisions with the new build chain
you'd be surprised by all those untracked files being displayed.
Or another example: Recently in git.git some test helper files were moved.
By checking out an older version of git you see a lot of test-* files
in your worktree although they were ignored at another revision.

That paragraph got longer than expected, but I just wanted to say that
being versioned can be either good or bad.

>
> - it is controlled by whoever writes history.  There is no way for me
>   to change the URLs in my mirror of https://gerrit.googlesource.com/gerrit
>   to match my mirror's different filesystem layout without producing
>   my own history that diverges from the commits I am mirroring.

To come up with an analogy to ignored files:
If I use a project and use a different build system, I may see untracked
files as they are not ignored by the .gitignore file.

Then I have a way of ignoring them nevertheless in .git/info/excludes.
Sharing this information beyond this repository is hard though, but
that wasn't seen as a feature yet?

>
> When the URLs in .gitmodules are relative URLs, this means that if
> I mirror a superproject, I have to mirror all its submodules, too,
> with the same layout.  It's not so easy for me to publish my copy
> of the parent project and the one subproject I made changes in --- I
> have to mirror everything.  In particular, this means I can't mirror
> https://gerrit.googlesource.com/gerrit to github.

because the way repository URLs work are different for these 2 hosts.
googlesource.com allows to have URLs that are nested in another level
e.g. Gerrit references "../plugins/download-commands", such that
remote URL becomes https://gerrit.googlesource.com/plugins/download-commands

At Github we cannot create another level of nesting as their naming follows the
owner/name scheme.

>
> When the URLs in .gitmodules are absolute URLs, this means that if
> I mirror a superproject, I cannot ask people consuming my mirror to
> use my mirrors of child projects, too.  I cannot publish my copy of
> the parent project and the one subproject I made changes in and
> expect people to be able to "git clone --recurse-submodules" the
> result successfully.


>
> It is as though refs were stored in a .gitrefs file, with all the
> attendant disadvantages, instead of being a separate component of
> the repository that a particular repository owner can manipulate
> without changing history.
>
> To fix this, we could allow additional .gitmodules settings to be put
> in another ref (perhaps something like "refs/repository/config" to allow
> sharing additional repository-specific configuration in other files
> within the same tree --- e.g., branch descriptions).  The semantics:
>
> * If there is a gitmodules file in refs/repository/config in the
>   repository I clone, then the submodule settings from it are stored
>   locally somewhere that overrides .gitmodules.  Perhaps
>   .git/info/<remotename>/gitmodules?
>
> * Later fetches from the remote would also update this gitmodules
>   file.
>
> * Settings from this gitmodules file can be overridden locally
>   using 'git config' until an explicit "git submodule sync" to
>   override the local configuration.
>
> What do you think?
>
> If two different remotes provide conflicting values for a setting
> in their gitmodules files, git would error out and ask the user
> to intervene with a tie-breaking "git config" setting.

Let's look at an example with C mirroring from B, who mirrors from A.

The user who clones the superproject from C may want to obtain submodules
from either C or B or A. All this can be configured in
the refs/repository/config value of C, but in case it is not configured in C,
it may fall back to the same branch from B. When and how would B get
that branch?

Thanks for writing out this detailed brain dump :)
Stefan

>
> Thanks,
> Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to