While we are waiting the remaining SGAs, I think now is a good time to
start thinking about how the move to ASF infrastructure will affect the
Daffodil project. ASF supports a different infrastructure than we used
in the past, so some changes will be required to workflow, and some
changes should be made to reduce the barrier to entry for new contributors.

== Documentation ==

Daffodil uses Confluence for user and developer documentation. ASF
provides a confluence instance, so we just need to transfer the
information. This may be a good time to reorganize our confluence pages
and remove/update old information, but should otherwise work exactly the
same.

AASF also provides web hosting for static content (e.g. downloads,
Daffodil high level overview, mailing list info, etc.) as a sort of
landing page for the project. This will need to be developed. I'm not
too familiar with website building tools, but there are many out
there--this will take more research. We should look at what other Apache
projects use as inspiration.

== Issue Tracking ==

ASF provides JIRA for tracking issues, and we even already have an empty
JIRA project set up for us at:

  https://issues.apache.org/jira/projects/DAFFODIL

Daffodil used JIRA before Apache so the workflow changes should not be
too different. We should probably maintain a very similar workflow with
this regard (e.g. all changes require a bug, assign to self when
starting progress, resolve issues when fixed, etc.). We can flesh out a
formal description and process for issue tracking for new contributors
to follow, but I think this is all fairly standard and will remain
mostly unchanged from what we had before. I'm sure there will be some
changes to the overall workflow (e.g. removal of scala-new, how will
bugs be officially closed, etc.) but they will all be relatively minor
and not really infrastructure related, so I don't want to spend too much
time on that in this email.

Note that one piece of effort related to JIRA is transferring our
existing bugs to the new JIRA. Based on reading through the INFRA JIRA
and seeing other projects do this, we mainly just need to export our
existing bugs as JSON and create a user mapping between the JIRA accounts.

== Patch Submission & Review ==

This is where we will likely have the most change relative to
infrastructure and am looking to have some more in-depth discussions.
Previously, the Daffodil workflow had all committers making changes to
"review" branches in the main repo, the changes were reviewed, and
finally rebased to the development branch. This could continue to work
for us, but it has some downsides. As we gain more committers, more
review branches could make the main repo pretty messy. And in general we
probably don't want lots of unreviewed code in the main repo, even if
they are on separate branches. Furthermore, and probably the biggest
reason to not continue this practice, is that contributors that are not
committers would not have the privileges to add review branches to the
main repo and so they would need to follow a different process than
committers. I propose that all committers should follow the exact some
contribution process as non-committers, and so we need a different patch
submissions and review process that works for both, of which there are a
few options below:

The first, and I think the traditional method for Apache projects, is
for contributors to add a patch to a JIRA ticket as an attachment. This
is convenient in that JIRA tickets and patches are closely tied
together, but creating a patch file and uploading it might not be as
easy as it could be. Once a patch is attached, a process is
automatically kicked off to run tests on the patch and start a review at
reviews.apache.org via ReviewBoard. This seems like a good workflow, but
I personally find ReviewBoard difficult to use and lacks some features
that I've become accustomed to after using Crucible for Daffodil in the
past.

A similar method would be to use github. Apache mirrors the Daffodil git
repository to github, and with the use of Apache gitbox, can even
support accepting github pull requests. This has some very obvious
benefits. Many people are already very familiar with github and so could
be a good way to attract more contributors. It also has an intuitive
interface for creating and accepting pull requests, again reducing
barrier to entry. Github also very cleanly integrates with TravisCI to
test pull requests. Note that JIRA must still be the bug tracker, and
gitbox copies all review comments to the original JIRA bug as comments.
This is good for tracking the review comments, but makes JIRA bugs
pretty messy and hard to follow. Also, there are some criticism of the
github code review interface, or people that simple do not want or have
a github account. Like the above, it also requires network connectivity
to draft reviews, though this may be a non-issue nowadays.

Another alternative, which is maybe less modern but is pretty tried and
true is to use something similar to Linux kernel review process. In this
process, all patches are emailed directly to the mailing list via
git-send-email. Review comments happen as replies to those emails,
allowing for complex and easily branching discussions. Committing a
patch requires that a committer save the email and apply it using
git-am. One big benefit of this process over the others is that patches
and review comments are much more likely to be seen since they go
directly to the dev list. This encourages activity and allows new devs
to learn as they see the patches. It also has a low barrier to
entry--one just needs to configure git-send-email to use SMTP servers of
preference and run a git command. It also also been shown to scale very
well, is well understood, and is well documented. It also follows the
ASF motto of "If it didn't happen on a mailing list, it didn't happen."
Note that this would not remove JIRA for bug tracking, so a downside is
that it may require some manual updates to JIRA such as specifying that
a patch has been submitted to the mailing list. This also does not
tightly integrate with continuous integration systems, so might require
committers to manually test patches (not necessarily a bad thing, and
tools like patchwork/snowpatch exist to send mailing list patches to a
Jenkins server, though not currently supported by Apache infra). Maybe
the biggest downside is that while people are familiar with email, it
doesn't have some nice features of other review tools, like marking
comments as resolved, syntax highlighting, etc. It's simple, but
minimal. The article and comments below have some good discussions about
the pros and cons of email for patches and how it works well for the
Linux kernel:

  https://lwn.net/Articles/702177/

I'm sure there are many other options that I have not considered. I'm
definitely open to alternatives.

== Continuous Integration ==

Previously, Daffodil used Bamboo for continuous integration. ASF does
not support this, but does support a few alternatives:

  https://ci.apache.org/

We have had experience setting up Daffodil to run on Jenkins in the
past, so this seems preferable. Though, it looks like both Jenkins and
Buildbot meet the necessary requirements, so either would likely work.
We could also provide a TravisCI configuration so that people that
maintain a github fork (regardless of the Patch Submission process)
could take advantage of that service).

== Maven Repository ==

Daffodil used a Nexus repository on the NCSA servers. Apache infra
provides a Nexus server, so this should be virtually unchanged. Just
need to publish to a different server, and tweak our release process to
follow Apache release guidelines.

- Steve

Reply via email to