Hi everyone,

I’ve noticed some questionable practices around commits going into master
lately (and historically, to be honest) and I want to remind everyone about
some best practices for commit and release quality.

   -

   *Please don’t mix partial, unrelated changes into a commit.* This makes
   it very difficult to manage branches and to correctly address bugs when
   they are found because of conflicts when picking or reverting commits.

   If you need to revert a commit, but it made unrelated changes then you
   have to go resolve the conflict by hand. When backporting to a release
   branch, unrelated changes require you to pull in more commits to get
   patches to apply, or again resolve the conflicts by hand. Every conflict
   makes maintenance take longer and increases the risk of mistakes.

   When submitting and reviewing patches, please think about whether
   cherry-picking or reverting the commit would cause unintended changes, and
   remove them.
   -

   *Please don’t commit unfinished changes.* If you want to pull features
   from master to a release branch, it is much harder if you also need to
   track down all of the additional commits needed to actually make it work.
   I’m not talking about splitting work into reasonable chunks; I’m talking
   about getting commits in before they’re reasonably finished. This makes it
   hard to know where the remaining changes were merged, or if they were
   merged at all.

   Often, I see this combined with the first problem. For example, #19925
   <https://github.com/apache/spark/pull/19925#issuecomment-351621232> was
   merged with a comment: “Merging to master to unblock #19926. If there are
   more comments, we can address them in #19926.” This change was “Add
   Structured Streaming APIs to DataSourceV2”, something that should clearly
   be finished before committing. Instead, parts of it might be lurking in
   “Split StreamExecution into MicroBatchExecution and StreamExecution.”
   -

   *Please don’t commit before reviews are finished.* This can be a tough
   call at times, but it is usually pretty clear. Are there people who should
   review a PR? Is there still ongoing discussion in the PR? Is there a +1
   from a Spark committer? (I recently noticed both ignoring an ongoing
   discussion and not getting a committer +1 on the same issue!)

   This is also a community problem. Whether or not it is intentional,
   committing prematurely sends a negative message to the community that we
   need to avoid. A good skill for a committer is to know when you should get
   agreement or consensus, even if you have a +1.
   -

   *Please don’t rush commits.* Rushing is cutting corners, and that almost
   always causes the problems above or others.

   #20427 is a good example
   
<https://github.com/apache/spark/pull/20427/files#diff-6d245f3b658c10a9d40d6e1772df3d24L28>,
   where admittedly rushing a commit led to trying to mix unrelated changes
   into the PR. #19925
   <https://github.com/apache/spark/pull/19925#issuecomment-351621232> was
   committed unfinished to unblock another commit, despite adding a public API
   that should require a careful review.

We all mess this up, myself included, and my intent is not to point
fingers, but to show examples of where we can improve. It is just far
easier to get a branch committed as-is than to adhere to these guidelines,
but these are important for our releases and downstream users.

Thanks for reading,

rb
​
-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to