While we are waiting the remaining SGAs, I think now is a good time to start thinking about how the move to ASF infrastructure will affect the Daffodil project. ASF supports a different infrastructure than we used in the past, so some changes will be required to workflow, and some changes should be made to reduce the barrier to entry for new contributors.
== Documentation == Daffodil uses Confluence for user and developer documentation. ASF provides a confluence instance, so we just need to transfer the information. This may be a good time to reorganize our confluence pages and remove/update old information, but should otherwise work exactly the same. AASF also provides web hosting for static content (e.g. downloads, Daffodil high level overview, mailing list info, etc.) as a sort of landing page for the project. This will need to be developed. I'm not too familiar with website building tools, but there are many out there--this will take more research. We should look at what other Apache projects use as inspiration. == Issue Tracking == ASF provides JIRA for tracking issues, and we even already have an empty JIRA project set up for us at: https://issues.apache.org/jira/projects/DAFFODIL Daffodil used JIRA before Apache so the workflow changes should not be too different. We should probably maintain a very similar workflow with this regard (e.g. all changes require a bug, assign to self when starting progress, resolve issues when fixed, etc.). We can flesh out a formal description and process for issue tracking for new contributors to follow, but I think this is all fairly standard and will remain mostly unchanged from what we had before. I'm sure there will be some changes to the overall workflow (e.g. removal of scala-new, how will bugs be officially closed, etc.) but they will all be relatively minor and not really infrastructure related, so I don't want to spend too much time on that in this email. Note that one piece of effort related to JIRA is transferring our existing bugs to the new JIRA. Based on reading through the INFRA JIRA and seeing other projects do this, we mainly just need to export our existing bugs as JSON and create a user mapping between the JIRA accounts. == Patch Submission & Review == This is where we will likely have the most change relative to infrastructure and am looking to have some more in-depth discussions. Previously, the Daffodil workflow had all committers making changes to "review" branches in the main repo, the changes were reviewed, and finally rebased to the development branch. This could continue to work for us, but it has some downsides. As we gain more committers, more review branches could make the main repo pretty messy. And in general we probably don't want lots of unreviewed code in the main repo, even if they are on separate branches. Furthermore, and probably the biggest reason to not continue this practice, is that contributors that are not committers would not have the privileges to add review branches to the main repo and so they would need to follow a different process than committers. I propose that all committers should follow the exact some contribution process as non-committers, and so we need a different patch submissions and review process that works for both, of which there are a few options below: The first, and I think the traditional method for Apache projects, is for contributors to add a patch to a JIRA ticket as an attachment. This is convenient in that JIRA tickets and patches are closely tied together, but creating a patch file and uploading it might not be as easy as it could be. Once a patch is attached, a process is automatically kicked off to run tests on the patch and start a review at reviews.apache.org via ReviewBoard. This seems like a good workflow, but I personally find ReviewBoard difficult to use and lacks some features that I've become accustomed to after using Crucible for Daffodil in the past. A similar method would be to use github. Apache mirrors the Daffodil git repository to github, and with the use of Apache gitbox, can even support accepting github pull requests. This has some very obvious benefits. Many people are already very familiar with github and so could be a good way to attract more contributors. It also has an intuitive interface for creating and accepting pull requests, again reducing barrier to entry. Github also very cleanly integrates with TravisCI to test pull requests. Note that JIRA must still be the bug tracker, and gitbox copies all review comments to the original JIRA bug as comments. This is good for tracking the review comments, but makes JIRA bugs pretty messy and hard to follow. Also, there are some criticism of the github code review interface, or people that simple do not want or have a github account. Like the above, it also requires network connectivity to draft reviews, though this may be a non-issue nowadays. Another alternative, which is maybe less modern but is pretty tried and true is to use something similar to Linux kernel review process. In this process, all patches are emailed directly to the mailing list via git-send-email. Review comments happen as replies to those emails, allowing for complex and easily branching discussions. Committing a patch requires that a committer save the email and apply it using git-am. One big benefit of this process over the others is that patches and review comments are much more likely to be seen since they go directly to the dev list. This encourages activity and allows new devs to learn as they see the patches. It also has a low barrier to entry--one just needs to configure git-send-email to use SMTP servers of preference and run a git command. It also also been shown to scale very well, is well understood, and is well documented. It also follows the ASF motto of "If it didn't happen on a mailing list, it didn't happen." Note that this would not remove JIRA for bug tracking, so a downside is that it may require some manual updates to JIRA such as specifying that a patch has been submitted to the mailing list. This also does not tightly integrate with continuous integration systems, so might require committers to manually test patches (not necessarily a bad thing, and tools like patchwork/snowpatch exist to send mailing list patches to a Jenkins server, though not currently supported by Apache infra). Maybe the biggest downside is that while people are familiar with email, it doesn't have some nice features of other review tools, like marking comments as resolved, syntax highlighting, etc. It's simple, but minimal. The article and comments below have some good discussions about the pros and cons of email for patches and how it works well for the Linux kernel: https://lwn.net/Articles/702177/ I'm sure there are many other options that I have not considered. I'm definitely open to alternatives. == Continuous Integration == Previously, Daffodil used Bamboo for continuous integration. ASF does not support this, but does support a few alternatives: https://ci.apache.org/ We have had experience setting up Daffodil to run on Jenkins in the past, so this seems preferable. Though, it looks like both Jenkins and Buildbot meet the necessary requirements, so either would likely work. We could also provide a TravisCI configuration so that people that maintain a github fork (regardless of the Patch Submission process) could take advantage of that service). == Maven Repository == Daffodil used a Nexus repository on the NCSA servers. Apache infra provides a Nexus server, so this should be virtually unchanged. Just need to publish to a different server, and tweak our release process to follow Apache release guidelines. - Steve
