Re: Need help deciding between subtree and submodule

Robert Dailey Thu, 19 Mar 2015 14:57:42 -0700

On Wed, Mar 18, 2015 at 6:04 PM, Doug Kelly <dougk....@gmail.com> wrote:
> On Wed, Mar 18, 2015 at 3:20 AM, Chris Packham <judge.pack...@gmail.com> 
> wrote:
>> My $0.02 based on $dayjob
>>
>> (disclaimer I've never used subtree)
>>
>> On Wed, Mar 18, 2015 at 11:14 AM, Robert Dailey
>> <rcdailey.li...@gmail.com> wrote:
>>> At my workplace, the team is using Atlassian Stash + git
>>>
>>> We have a "Core" library that is our common code between various
>>> projects. To avoid a single monolithic repository and to allow our
>>> apps and tools to be modularized into their own repos, I have
>>> considered moving Core to a subtree or submodule.
>
> $DAYJOB has actually tried both... with varying levels of success.  As
> you note, subtree looks wonderful from a user perspective, but behind
> the scenes, it does have issues.  In our case, subtree support was
> modified into Gerrit, and this became cumbersome and difficult to
> maintain (which is the reason we eventually dropped support for
> subtree).  Submodules have more of a labor-intensive aspect, but
> are far more obvious about what actions have been taken (IMHO).
> Either way, both our developers' needs were satisfied: the code was
> tracked cleanly, and there wasn't a configuration mismatch where
> a dependency was able to change versions without implicit direction.
>
>>
>> Our environment is slightly different. Our projects are made up
>> entirely of submodules, we don't embed submodules within a repo with
>> actual code (side note: I know syslog-ng does so it might be worth
>> having a look around there).
>>
>> Day to day development is done at the submodule level. A developer
>> working on a particular feature is generally only touching one repo
>> notwithstanding a little bit of to-and-fro as they work on the UI
>> aspects. When changes do touch multiple submodules the pushes can
>> generally be ordered in a sane manner. Things get a little complicated
>> when there are interdependent changes, then those pushes require
>> co-operation between submodule owners.
>
> We've done both (all of the above? a hybrid approach?)... We've gone so
> far to create 30 modules for every conceivable component, then tried to
> work that way with submodule, and our developers quickly revolted as it
> became too much of a maintenance burden.  The other direction (with
> hugely monolithic code) is also problematic since the module boundaries
> become blurred.  For us, usually cooperation between modules isn't so
> difficult, but the problem comes about when attempting to merge the
> changes.  Sometimes, it can take significant effort to ensure conflict-free
> merges (going so far as to require "merge lock" emails to ask other
> developers to hold off on merging commits until the change lands
> completely and the project is stable).
>
>>
>> The key to making this work is our build system. It is the thing that
>> updates the project repo. After a successful build for all targets (we
>> hope to add unit/regression tests one day) the submodules sha1s are
>> updated and a new baseline (to borrow a clearcase term) is published.
>> Developers can do "git pull && git submodule update" to get the latest
>> stable baseline, but they can also run git pull in a submodule if they
>> want to be on the bleeding edge.
>>
>>> I tried subtree and this is definitely far more transparent and simple
>>> to the team (simplicity is very important), however I notice it has
>>> problems with unnecessary conflicts when you do not do `git subtree
>>> push` for each `git subtree pull`. This is unnecessary overhead and
>>> complicates the log graph which I don't like.
>>>
>>> Submodule functionally works but it is complicated. We make heavy use
>>> of pull requests for code reviews (they are required due to company
>>> policy). Instead of a pull request being atomic and containing any app
>>> changes + accompanying Core changes, we now need to create two pull
>>> requests and manage them in proper order. Things also become more
>>> difficult when branching. All around it just feels like submodule
>>> would interfere and add more administration overhead on a day to day
>>> basis, affecting productivity.
>>
>> We do have policies around review etc. With submodules it does
>> sometimes require engaging owners/reviewers from multiple
>> repositories. Tools like Gerrit can help, particularly where multiple
>> changes and reviewers are involved.
>
> Conflicts are definitely going to be a difficulty with either subtree or
> submodule (if multiple users could be changing the submodule), but
> if you have additional tools, such as Gerrit to look out for, submodule
> is the way to go since subtrees aren't supported within Gerrit. (Other
> tools may support it better: I'm honestly not sure?)  That would be
> my one word of caution: I don't know how well Stash supports subtree.
>
> You are absolutely correct about the difficulty of integrating submodule
> pull requests taking two steps.  This was an issue we worked hard
> to mitigate here, but at the end of the day, the work is necessary.
> Basically, we could also use a feature within Gerrit to automatically
> bring up a specific branch of the "superproject" when the submodule
> project on a certain branch changes, but this also rolls the dice a bit
> since it bypasses any code review or CI step.
>
>>
>>> Is there a third option here I'm missing? If only that issue with
>>> subtree could be addressed (the conflicts), it would be perfect enough
>>> for us I think. I have done all the stackoverflow reading and research
>>> I can manage at this point. I would really love some feedback from the
>>> actual git community on what would be a practical solution and
>>> structure here from a company perspective.
>>
>> There's the thing google use for android, I think it's called "repo".
>> There's a few googlers around here so mybe one of them will chime in.
>
> Repo is an interesting middle ground.  It does expect to interact with Gerrit,
> I believe, and it handles the splitting of changes/commits and reassembling
> them. (And, since it uses a manifest to track the submodules, it can handle
> pointing to a specific commit, the latest commit on a branch, etc.) The rumor
> I've heard around repo, however, is that there are plans to ensure parity
> (as much as possible) in submodule and eventually remove the need for
> repo.  But, this could also be a pipe dream...
>
> Perhaps it's worth noting another possibility: using and releasing the core
> library as an actual "library" which is tracked independently from the
> remainder of the code, and integrated at build/run time by your build tools.
> This would be closer to Maven's approach in the Java world, or a package
> manager in a system environment.  Even in the embedded world, tools in
> the Yocto project can handle gathering the dependencies in some sort of
> "recipe." I know this is a somewhat idealistic view, however... after all,
> there are reasons we're still using submodules ourself. :)
>
> --Doug


Really appreciate the feedback guys. Maybe I'm trying too hard to keep
things from changing when they should. Modularized repositories will
inherently involve more work since they are now separate entities.

>From a risk perspective, do you feel it is better to use submodules
than subtrees? Subtrees seem a bit dangerous since they involve
history mutation and it seems possible to get into a messy state.

Submodules on the other hand are pretty difficult to manage when you
get submodule SHA1 conflicts. It's not obvious from the diff what has
changed between the 2 submodule commits, so some deep investigation
will be necessary.

No one on my team is experienced with Git. In fact, I do as much as I
can to help educate them. I feel like adding either of these will
cause them to get a negative outlook on Git itself. I already have to
listen to some of them tease me about how SVN never had problems :-(

Thanks again guys I'll keep thinking about it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Need help deciding between subtree and submodule

Reply via email to