Hi, Ryan,

Thanks for your inputs. These comments are pretty helpful! Please continue
to help us improve Spark and Spark community.

Thanks again,

Xiao



2018-01-30 12:58 GMT-08:00 Ryan Blue <rb...@netflix.com.invalid>:

> Hi everyone,
>
> I’ve noticed some questionable practices around commits going into master
> lately (and historically, to be honest) and I want to remind everyone about
> some best practices for commit and release quality.
>
>    -
>
>    *Please don’t mix partial, unrelated changes into a commit.* This
>    makes it very difficult to manage branches and to correctly address bugs
>    when they are found because of conflicts when picking or reverting commits.
>
>    If you need to revert a commit, but it made unrelated changes then you
>    have to go resolve the conflict by hand. When backporting to a release
>    branch, unrelated changes require you to pull in more commits to get
>    patches to apply, or again resolve the conflicts by hand. Every conflict
>    makes maintenance take longer and increases the risk of mistakes.
>
>    When submitting and reviewing patches, please think about whether
>    cherry-picking or reverting the commit would cause unintended changes, and
>    remove them.
>    -
>
>    *Please don’t commit unfinished changes.* If you want to pull features
>    from master to a release branch, it is much harder if you also need to
>    track down all of the additional commits needed to actually make it work.
>    I’m not talking about splitting work into reasonable chunks; I’m talking
>    about getting commits in before they’re reasonably finished. This makes it
>    hard to know where the remaining changes were merged, or if they were
>    merged at all.
>
>    Often, I see this combined with the first problem. For example, #19925
>    <https://github.com/apache/spark/pull/19925#issuecomment-351621232>
>    was merged with a comment: “Merging to master to unblock #19926. If there
>    are more comments, we can address them in #19926.” This change was “Add
>    Structured Streaming APIs to DataSourceV2”, something that should clearly
>    be finished before committing. Instead, parts of it might be lurking in
>    “Split StreamExecution into MicroBatchExecution and StreamExecution.”
>    -
>
>    *Please don’t commit before reviews are finished.* This can be a tough
>    call at times, but it is usually pretty clear. Are there people who should
>    review a PR? Is there still ongoing discussion in the PR? Is there a +1
>    from a Spark committer? (I recently noticed both ignoring an ongoing
>    discussion and not getting a committer +1 on the same issue!)
>
>    This is also a community problem. Whether or not it is intentional,
>    committing prematurely sends a negative message to the community that we
>    need to avoid. A good skill for a committer is to know when you should get
>    agreement or consensus, even if you have a +1.
>    -
>
>    *Please don’t rush commits.* Rushing is cutting corners, and that
>    almost always causes the problems above or others.
>
>    #20427 is a good example
>    
> <https://github.com/apache/spark/pull/20427/files#diff-6d245f3b658c10a9d40d6e1772df3d24L28>,
>    where admittedly rushing a commit led to trying to mix unrelated changes
>    into the PR. #19925
>    <https://github.com/apache/spark/pull/19925#issuecomment-351621232>
>    was committed unfinished to unblock another commit, despite adding a public
>    API that should require a careful review.
>
> We all mess this up, myself included, and my intent is not to point
> fingers, but to show examples of where we can improve. It is just far
> easier to get a branch committed as-is than to adhere to these guidelines,
> but these are important for our releases and downstream users.
>
> Thanks for reading,
>
> rb
> ​
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to