The way I read Thomas' original email is that it's generally not a nice
sign for a contributor if her work gets reverted. We all come from
different backgrounds. For some, reverting is just a tool to get the job
done, for others it might come across as offensive.
I know of communities where reverting is the absolute last resort. Now,
Beam needs to find its own way. I think there are definitive advantages
to reverting quickly.
In the most obvious case, when our tests are broken and a fix is not
viable, reverting unblocks other contributors to test their code. I
think this has been working fine in the past.
In the less obvious case, an external Runner or system is broken due to
an update in the master. IMHO this does not warrant an immediate revert
on its own. As already mentioned, there should be some justification for
a rollback. This is not to make people's life harder but to figure out
whether the problem can be solved upstream or downstream, or with a
combination of both.
I think Thomas wanted to address this latter case. It seems like we're
all more or less on the same page. The core problem is more related to
communicating reverts in a way that helps contributors to save face and
the community to work efficiently.
Thanks,
Max
On 19.11.18 10:51, Robert Bradshaw wrote:
If something breaks Beam's post (or especially pre) commit tests, I
agree that rollback is typically the best option and can be done
quickly. The situation is totally different if it breaks downstream
projects in which Kenn's three points are good criteria for determining
if we should rollback, which should not be assumed to be the default option.
I would say the root cause of the problem is insufficient visibility and
testing. If external-to-beam tests (or production jobs) are broken in
such a way that rollback is desired, I would say the onus (maybe not a
hard requirement, but a high bar for exceptions) is on whoever is asking
for the rollback to create and submit an external test that demonstrates
the issue. It is their choice whether this is easier than rolling
forward or otherwise working around the breakage. This seems like the
only long-term sustainable option and should get us out of this bad
situation.
(As an aside, the bar for rolling back a runner-specific PR that brake
that runner may be lower, though still not automatic as other changes
may depend on it.)
- Robert
On Sat, Nov 17, 2018 at 7:35 PM Kenneth Knowles <k...@apache.org
<mailto:k...@apache.org>> wrote:
Just adapting my PR commentary to this thread:
Our rollback first policy cannot apply to a change that passes all
of Beam's postcommit tests. It *does* apply to Beam's postcommit
suites for each and every runner; they are equally important in this
regard.
The purpose of rapid rollback without discussion is foremost to
restore test signal and not to disrupt the work of other
contributors, that is why it is OK to roll back before figuring out
if the change was actually bad. If that isn't at stake, the policy
doesn't make sense to apply.
But...
- We have at least three examples of runners where there are
probably tests outside the Beam repo: Dataflow, Samza runner, and
IBM Streams.
- We also may have users that try running their production loads
against Beam master branch to learn early whether the next release
will break them.
These are success stories for Beam. We should respect these other
sources of information for what they are: users and vendors giving
us a heads up about changes that will be a problem if we release them.
Often rollback is still a good option but IMO it is no longer
automatically the best option, and may not even be OK. I would say
that the case must be made clearly and *publicly* that
(1) something is actually broken
(2) the revert fixes the problem
(3) revert is the best option
In this scenario there is time to consider. An important and common
case is that a perfectly fine change exposes something already
broken, so the best option may be sickbaying downstream or pinning
their version/commit of Beam until they can fix.
Kenn
On Fri, Nov 16, 2018 at 8:15 PM Ahmet Altay <al...@google.com
<mailto:al...@google.com>> wrote:
>
> It sounds like we are in agreement that addressing issues sooner
is better. I think reverting is in general the less stressful option
because it allows a solution to be developed in parallel. Even with
that, it is not the only option we have and based on the severity
and the complexity of the problem we can consider other options.
Fixing forward might be feasible in some cases.
>
> We can bring issues back to the mailing list. This would be akin
to bringing any issues to the mailing list. I think JIRA is a better
tool for that. These reverts are happening because of an issue, and
JIRA allows informing all involved parties, creates emails to the
issues list for later searching through mailing archives, and
creates a record of things in structured way with components.
>
> We could establish general policies about for all reverts to have
an issue (which we already do because they are regular PRs),
including all people in the discussion (including the author and
reviewers) and follow up with new tests to expands Beam's test coverage.
>
> On Fri, Nov 16, 2018 at 7:55 PM, Thomas Weise
<thomas.we...@gmail.com <mailto:thomas.we...@gmail.com>> wrote:
>>
>>
>>
>> On Fri, Nov 16, 2018 at 7:39 PM Ahmet Altay <al...@google.com
<mailto:al...@google.com>> wrote:
>>>
>>> Thank you for bringing this discussion back to the mailing list.
>>>
>>> On Fri, Nov 16, 2018 at 6:49 PM, Thomas Weise <t...@apache.org
<mailto:t...@apache.org>> wrote:
>>>>
>>>> We have observed instances of changes being reverted in master
that have been authored following the contributor guidelines and
pass all tests (post commit). While we generally seem to have quite
a bit of revert action happening [1], this thread is about those
instances that are outside of our documented policies.
>>>>
>>>> For a contributor, it isn't a good experience to see reverts
(especially not out of the blue) after a PR has been reviewed, all
tests pass and generally care has been taken to do the right things.
>>>
>>>
>>> I completely agree. Everyone involved needs to have the context
about why a change is being reverted. A JIRA with information is
probably a good way to do it, similar to the any other issue we track.
>>>
>>>>
>>>>
>>>> Changes can have unforeseen effects downstream. Usually
releases provide the opportunity to mitigate such issues, not
necessarily by a revert, but in many cases by another change that
keeps everyone happy. Unexpected reverts can break someone else and
are disruptive.
>>>
>>>
>>> I disagree about waiting until release to resolve an identified
issue. Let's say we become aware of an issue through channels other
than Beam tests (e.g. user reports, a contributor running into
something not captured in Beam tests) and we know that it is a
credible issue that will block the release anyway. Addressing the
issue sooner will be less painful than addressing later. (I am not
suggesting addressing every such issue similar to we do not block
releases on every open issue. There needs to be a due diligence to
understand the severity and the impact.)
>>>
>>> We can improve on the above process. If we end up reverting a
change as a result of a report that is not covered by existing Beam
tests, we could expand the tests to catch same class of problems
even earlier by means of Beam's own tests.
>>
>>
>> I'm referring to the release process as an example how such
issues can be addressed. I'm not suggesting to wait until a release
to address issues; as you say the sooner they are identified the better.
>>
>> However, I don't agree with taking out the revert hammer on the
slightest sign that there could be a downstream problem. There are
better ways to handle this. First of all, I think that these issues
should be visible on the dev mailing list so that there is more
awareness overall (which will lead to better test coverage and
useful feedback in general). Second, we should make an effort to
resolve issues in a forward looking way. If after all it turns out
that a revert is most appropriate for the situation, it should
follow explicit agreement.
>>
>>>
>>>
>>>>
>>>>
>>>> Some discussion already took place on one specific PR [3], but
that is just an example and by no means an isolated incident.
>>>>
>>>> On some of these reverts, "internal" problems
>>>
>>>
>>> This is a communication issue that needs be addressed. Over
time we are getting new contributors and that is great, but it also
means these new folks need to understand operating conditions of
this community. Giving direct feedback would generally be most
efficient here.
>>>
>>>>
>>>> with an important runner are cited, with little to no
explanation. It would be nice if folks with more insight can shed
more light on this.
>>>
>>>
>>> All runners we accepted at out master branch and include in our
releases is important. I do not find a reference to an important
runner in the examples other than a question about what would the
outcome be for a runner other than Dataflow. I think the outcome
would be the same, and in my opinion it should be. We need to be
careful not breaking any runner we support.
>>>
>>>>
>>>> I hope that as outcome of this discussion, we can arrive at a
better understanding of why and when such reverts were necessary and
possibly find ways to largely avoid them going forward.
>>>>
>>>> Thanks!
>>>> Thomas
>>>>
>>>>
>>>> [1]
https://github.com/apache/beam/search?o=desc&q=Revert&s=committer-date&type=Commits
>>>> [2] https://beam.apache.org/contribute/postcommits-policies/
>>>> [3] https://github.com/apache/beam/pull/7012
>>>>
>>>