Re: [Kwant-devel] Thoughts about development workflow

Christoph Groth Sun, 25 Oct 2015 06:10:54 -0700

Anton Akhmerov wrote:

Christoph Groth wrote:
Let’s take a step back and reflect on what we actually need. Some of you have expressed the desire to have a more web-based or modern (or github-like…) Kwant development workflow.
Christoph, I believe you're downplaying the reasoning here, it's not as arbitrary. Ultimately, the reason why I raised this question together with others is not the similarity of the workflow to github or it being web-based, but the entry level for contributing. I have just checked with the Kwant survey that approximately half of the potential contributors don't use version control. While I believe the physics community as a whole benefits from the proliferation of version control, Kwant itself benefits from having contributors more than from enforcing best practices. So I want to make it reasonably easy to contribute to Kwant for people who don't have time or wish to learn details of git usage, PEP 8, PEP 257 (as long as the contributions are actually useful of course).

I believe that gitlab (or github) is not a solution to the specific challenge that you name. As you know, for a typical potential contributor who has no experience with version control, the gitlab/github workflow offers a whole series of difficulties. The minimum that such a person has to learn (beyond Kwant) in order to contribute is:


• git basics (repositories, commits, branches, merges, remotes)
• clunky git UI
• how to use git{hu,la}b
• the interplay of the official repository and “forks”

I think it’s much easier to propose to inexperienced contributors to work directly with the source tarballs, and then to ask them to send us the result of “./setup.py sdist”.

If potential users are keen to use git, the easiest way is to ask them to


git clone <kwant>
optional: git checkout -b my_topic_branch
repeat until done:
 <hack>
 git commit -am ’my commit’
git format-patch origin/master

And then to ask them to either send the *.patch files as attachments to kwant-devel, or use “git send-mail”. Observe how much simpler (conceptually and practically) this is than the standard github workflow, if you do not know git or github.

I agree with Joe that we should have a “contribute” tab on the website, probably right of “community”. Let’s consider explaining the above simple ways to contribute before asking people to use gitlab or github. This will not discourage those who already know these sites.

But then I’m actually not very optimistic that we will get valuable contributions from people who are not interested to learn some technicalities that, after all, are much simpler than contributing something useful to Kwant. This simple way of contributing has been by the way always possible. (We didn’t advertise it, though.)

I think the reason why we haven’t got almost any external contributions to Kwant is that our development was not public and, because we did not try hard enough to motivate potential contributors.

Making another step back you could ask why we bother using git at all if it's so hard to learn, and for me the answer is that it makes some tasks easier for us. The aspect relevant to our discussion is that it allows to easily see what exactly was done, ensure that it's easy to modify, reapply, or undo the changes, and to see who did what. I agree with you that it is not a matter of pedantry, but a very useful feature.

Git is nice for managing the source, but for one-time contributors it’s not essential. If someone sends me a tarball created by “./setup.py sdist” it’s quite trivial to convert this into a git patch.

This still leaves us with freedom to decide how we use git, and the current approach consists of maintaining a linear history where every commit is meaningful. The "github" approach results in a history with a lot of merge commits and many commits on feature branches breaking things or being generally of poor quality, so it appears to hurt the usefulness of version control. However, looking around, I have found that git allows to achieve comparable results with dirty branches. For example, '--first-parent' will make the history easy to navigate (e.g. like this: http://antonakhmerov.org/misc/first-parent.png), and rebase replaces cherry-pick. So it is possible to keep the history clean using github approach, but it relies on using different conventions for what is considered clean. I do realize that this approach shifts the some of the burden from the contributors to developers, since manipulating github-style history is marginally harder.

It’s a valid approach to consider topic branches as “dirty inside” and to consider the merges as the real commits. The only principal problem that I see with this is in the case where one would have liked to present the change as a series of several commits even in the “clean history” approach. The “dirty branches” approach has no way to present a non-trivial change as logical series of patches that build on top of each other.

There is also a technical problem with this approach: http://devblog.nestoria.com/post/98892582763/maintaining-a-consistent-linear-history-for-git

Finally, the practical problem is that (to my knowledge) there are no tools that support it properly. True, gitk has a --first-parent option, but there’s no way to see what changes the merge actually introduces (other than running git diff manually). Run “gitk --first-parent” and try to see what changes the merges From Kwant’s stable branch introduce.

• (p2) The github crowd has no obvious way to contribute to Kwant. I do not think that this is a big problem, but it would be certainly nice to allow contributions in a way that is familiar to many people. BUT I strongly insist on not giving up nice aspects of our own way of working because “everybody else is doing it like that”. Also, I think that it is crucial that it’s possible to contribute to a piece of software without having to accept the terms & conditions of some specific non-free software. Alas, many projects do not see this problem.
I would extend this: not only the github users have no obvious way to contribute, but also physicists with limited programming skills. Further "Everybody else is doing it like that" is indeed not an ultimate argument, but it is an argument nonetheless since it saves time to contributors familiar with a common convention. As I argued above, I believe we can adopt the other workflow while not giving up the practical consequences of our current workflow.

As I detailed above, I think that learning git and github is quite a challenge to people with limited programming skills. Certainly much more than asking them to send us the modified files.

I think that to get valuable contributions tools are secondary. They key is to approach people and to motivate them. By having Kwant on github/gitlab we certainly lower the barrier to the many people who already know these tools. We will see how much this will bring.

• (p4) It could be beneficial to setup some way of automated testing of the code base (confusingly, that often is called “continuous integration”): If a patchset gets proposed, it would be nice to see whether all tests pass. But then, if a contributor is unable to internalize the simple rule that all tests should always pass, that’s a bad contributor. Also, if code review is done in a reasonable way, it should be trivial for the reviewer to execute the tests himself (as it is now).
As I already mentioned to you in person, this makes me a bad contributor :-) I try my best, but sometimes I forget. Automated tests allow both contributors and reviewers to spare some mental effort. Since the goal of programming is to make routine tasks automated, I'm very much in favor of automated testing.

Automated testing certainly does no harm and can be useful. But I see the danger of people starting to rely on it and not caring enough about the commits they push out. If we manage to keep commit quality high (and someone else than me maintains CI), I do not mind having CI at all!

So, in summary, I maintain my original analysis about what are the good parts of our workflow and what could be improved. :-)

If we had adopted a Mailing-list based workflow with a web-based bugtracker, and a github mirror (with disabled issues) and the above simple instructions for unexperienced contributors, I claim that this would have worked well for beginners, github-aficionados, and regular Kwant contributors alike. And this is what I proposed in the very beginning of our discussions, if you remember.

I think that adopting a self-hosted gitlab is also a good compromise. Perhaps there’s even a psychological community-building effect that I underestimate. We will profit From the code review and issue tracking facilities in gitlab, _but_ I think that we should not give up to ask (from regular contributors) to present their work as carefully crafted patches. Also, IMHO we we should advertise that we will also accept patches on the development mailing list.


Christoph

smime.p7s
Description: S/MIME cryptographic signature

Re: [Kwant-devel] Thoughts about development workflow

Reply via email to