Hi, Ryan, Thanks for your inputs. These comments are pretty helpful! Please continue to help us improve Spark and Spark community.
Thanks again, Xiao 2018-01-30 12:58 GMT-08:00 Ryan Blue <rb...@netflix.com.invalid>: > Hi everyone, > > I’ve noticed some questionable practices around commits going into master > lately (and historically, to be honest) and I want to remind everyone about > some best practices for commit and release quality. > > - > > *Please don’t mix partial, unrelated changes into a commit.* This > makes it very difficult to manage branches and to correctly address bugs > when they are found because of conflicts when picking or reverting commits. > > If you need to revert a commit, but it made unrelated changes then you > have to go resolve the conflict by hand. When backporting to a release > branch, unrelated changes require you to pull in more commits to get > patches to apply, or again resolve the conflicts by hand. Every conflict > makes maintenance take longer and increases the risk of mistakes. > > When submitting and reviewing patches, please think about whether > cherry-picking or reverting the commit would cause unintended changes, and > remove them. > - > > *Please don’t commit unfinished changes.* If you want to pull features > from master to a release branch, it is much harder if you also need to > track down all of the additional commits needed to actually make it work. > I’m not talking about splitting work into reasonable chunks; I’m talking > about getting commits in before they’re reasonably finished. This makes it > hard to know where the remaining changes were merged, or if they were > merged at all. > > Often, I see this combined with the first problem. For example, #19925 > <https://github.com/apache/spark/pull/19925#issuecomment-351621232> > was merged with a comment: “Merging to master to unblock #19926. If there > are more comments, we can address them in #19926.” This change was “Add > Structured Streaming APIs to DataSourceV2”, something that should clearly > be finished before committing. Instead, parts of it might be lurking in > “Split StreamExecution into MicroBatchExecution and StreamExecution.” > - > > *Please don’t commit before reviews are finished.* This can be a tough > call at times, but it is usually pretty clear. Are there people who should > review a PR? Is there still ongoing discussion in the PR? Is there a +1 > from a Spark committer? (I recently noticed both ignoring an ongoing > discussion and not getting a committer +1 on the same issue!) > > This is also a community problem. Whether or not it is intentional, > committing prematurely sends a negative message to the community that we > need to avoid. A good skill for a committer is to know when you should get > agreement or consensus, even if you have a +1. > - > > *Please don’t rush commits.* Rushing is cutting corners, and that > almost always causes the problems above or others. > > #20427 is a good example > > <https://github.com/apache/spark/pull/20427/files#diff-6d245f3b658c10a9d40d6e1772df3d24L28>, > where admittedly rushing a commit led to trying to mix unrelated changes > into the PR. #19925 > <https://github.com/apache/spark/pull/19925#issuecomment-351621232> > was committed unfinished to unblock another commit, despite adding a public > API that should require a careful review. > > We all mess this up, myself included, and my intent is not to point > fingers, but to show examples of where we can improve. It is just far > easier to get a branch committed as-is than to adhere to these guidelines, > but these are important for our releases and downstream users. > > Thanks for reading, > > rb > > -- > Ryan Blue > Software Engineer > Netflix >