Calcite has always been a "Jira first" project, where all significant
commits have a Jira case number (CALCITE-nnnn). We've not allowed
patch attachments since the very early days, so each of those commits
also has a GitHub pull request (PR).

Given that discussion can occur on the Jira case and the PR, does it
matter where that discussion occurs? In my opinion, it makes a great
deal of difference.

In engineering, it is essential to separate the specification of a
change (bug or feature request) from implementation. The specification
is what that change does, and the implementation is how it does it.
The specification can be understood by the end-user of the change
(often the user who writes SQL queries, but sometimes an engineer who
is using Calcite's public or private APIs), whereas an implementation
may include a brief description of an algorithm but is mainly just
code (and tests).

Which is more important: specification or implementation? In my
opinion, specification is way more important. From a good description
of the problem, even a good one-line summary, an engineer can in most
cases create an implementation. The specification also serves
end-users (reading the release notes), it serves as documentation for
future users of the feature, and helps future maintainers figure out
how the project fits together. But if all we have is code, the only
way to understand what has been done is to read the code. This doesn't
scale.

This has come up a couple of times recently.

In https://issues.apache.org/jira/browse/CALCITE-7013 /
https://github.com/apache/calcite/pull/4374 there were discussions in
both the Jira and GitHub about whether this was even a desirable
change. Mihai ended up merging the PR even though I had said "This is
not a bug" in the Jira case. This is basically one committer
overriding (albeit unintentionally) another commiter's -1.

In https://issues.apache.org/jira/browse/CALCITE-7029 /
https://github.com/apache/calcite/pull/4392 the summary is "Support
DPhyp to handle various join types", which is meaningless even to
someone like me who follows academic work on query optimization.
Jensen added a comment in the PR asking for a link to the paper where
the 'DPhyp' term was defined. (Thank you Jensen!) But really, all work
reviewing the PR should stop, until we have a good description in the
Jira case.

I would like us to adopt two policies:
 * A committer should not merge a PR until the Jira has a good summary
and description.
 * Discussion in a PR about specification (what, as opposed to how)
should be moved to Jira or the dev list.

(Personally, I will not even look at a PR until the Jira is in good
shape, but I don't expect most people would go that far.)

Do people have comments on how we use Jira vs GitHub PRs, and how we
balance specification, implementation, and tests?

Julian

Reply via email to