Top-post comment: Aggregation of test results is hard. We've had a million
threads on it. You want to have a clear green "this runner works" signal
but you also want completely isolated "this runner works for Java" / "this
runner works for Python" / etc signals. The tension between these is
essential complexity. We have gotten into discussions of this and we
shouldn't. This proposal does not change anything about how we approach
that problem. It addresses these goals:

 - make it obvious how to reproduce a signal via GH comment (unanimous
consensus so far on addressing this)
 - decouple the trigger for a signal from the identifier for the signal
(that's the pre/postcommit removal that is also consensus)
 - make it obvious how to reproduce a signal without GHA
 - make it obvious what a signal represents (arguably identical to the
above, but some comments below suggest it is in tension)
 - don't waste resources

There's light objection to naming one-line (or gradle task) workflows after
the one line (or gradle task) they run, and conflation of that part of the
proposal with the problem of signal aggregation.

I think there are two alternatives proposed:

 - When it is obvious, name a test signal after the one liner that
reproduces it, otherwise do your best, and fall back to abstract names
 - Use abstract names primarily (the fact that we make them look like
natural language is fine but illusory)

Personally:

 - I dislike having to code search for the name of a signal to find the GHA
file in order to pull out how to reproduce it.
 - I also think that "XVR_GoUsingJava_Dataflow" is just as hard to
understand as
":runners:google-cloud-dataflow-java:validatesCrossLanguageRunnerGoUsingJava"

These are personal perspectives.

I can discern values behind the positions here:

 - GHA is just automation to run test suites that should be easily runnable
in other contexts (including, say, migrating from GHA to whatever comes
after)
 - GHA is a place we build the logic and abstractions of our test suites

I fall strongly in the first camp. I think the second one is just a tight
coupling with a weirdly strictly-negative value proposition (even writing
and testing a shell script is easier in a shell script than in yaml).

Kenn

p.s. inline commentary because there seem to be a lot of misconceptions
about Gradle (which, to be clear, I am *not* a huge supporter or promoter
of!) and my detailed opinions about the anti-pattern of building a lot of
logic in your CI



> On Tue, Oct 10, 2023, 8:49 AM Danny McCormick via dev <dev@beam.apache.org>
> wrote:
>
>> Ah cool - I'm still -1 on this, though less strongly than if it was
>> actually tying it to Gradle. Most of these would probably be gradle
>> commands anyways as proposed. It is often a good idea to have workflows
>> that have multiple consecutive gradle commands (also addressing Robert's
>> point here) so there's not a single repro command.
>>
>
It is actually *rarely* a good idea to have correctness testing workflows
that have multiple consecutive gradle commands, and even more rarely a good
idea to have correctness testing workflows that are not trivial to
reproduce outside of CI. It usually means you are holding it (both Gradle
and CI) wrong. Other workflows, sure.

But this proposal does not propose any changes to either of those cases.


1) A job that runs the same performance test on a remote runner with a
>> bunch of different configs. A single gradle task has the downsides of less
>> clear observability (which task actually caused the failed build?) and
>> resource wasting on failures (early failures keep us from using as many
>> resources running multiple possibly expensive jobs).
>>
>
I must not understand what you mean, because it seems the premises seem
false.

 - Gradle very much can halt on the first failure and can in fact fail
sooner because that failure can be arrived at in parallel.
 - Subtasks a single gradle task are trivially observable via gradle scan.
In fact it is the *only* place in our stack that has this feature.

But the workflows you refer to, if I recall, primarily take different
environmental parameters. Again, for emphasis, I'm not proposing to change
them. Whether it is useful to create specific gradle subtasks that fix
parameters to particular values is an interesting debate with no right
answer.


2) A single runner validation suite that runs Java/Python/Go tests for that
>> runner. We don't really do this much, but it is a totally reasonable
>> (desirable?) way to structure tests.
>>
>
I can't tell what you mean by this. If you mean we want aggregated signal
that a runner meets the basic spec across "everything" then yes. If you
mean that we want granular signal that a runner meets the basic spec as
expressed by a particular SDK then also yes.


3) An expensive job that has one command to run a small set of tests and
>> then another to run a more expensive set only if the initial one passed.
>>
>
This makes sense conceptually, but I am not aware of a concrete instance of
it. In practice how it works for us is that lighter-weight hermetic tests
and sometimes highly targeted medium-weight integration tests (such as for
a specific IO being edited) will run on every change to a PR, while all
heavier-weight tests are by request. My proposal leaves this as-is. It
actually would be cool to have the heavyweight workflow show up as
"failed/required" but only run on demand. Because you want to wait until
"ready" to run the heavyweight thing. You don't want it to necessary kick
off just as soon as the hermetic tests pass. Anyhow, this is just
hypothetical.



> It's also worth mentioning that often the command to run a step is quite
>> long, especially for things like perf tests that have lots of flags to pass.
>>
>
Agree with this. Might be a case where you choose some more abbreviated
identifier


All of these examples *could *be bundled into a single gradle command, but
>> they shouldn't have to be. Instead we should have a workflow interface that
>> is as independent of implementation as possible IMO and represents an
>> abstraction of what we actually want to test (e.g. "Run Dataflow Runner
>> Tests", or "Run Java Examples Tests"). This also avoids us taking a
>> dependency that could go out of date if we change our commands.
>>
>
I agree with the goal here. But I don't believe we should use CI / GHA to
build an abstraction of "tasks that you can request by name to be run". We
already have a dependency-driven build system with that same characteristic
:-). For CD stuff, sure. But for test signals they really ought to be basic
test suites that can be invoked on the command line, ideally without
parameters.



> > Mostly fixing stuff like how the status label "Google Cloud Dataflow
>> Runner V2 Java ValidatesRunner Tests (streaming)" is literally a less
>> useful way of writing
>> ":runners:google-cloud-dataflow-java:validatesRunnerV2Streaming".
>>
>> Last thing I'll add - this is true for you and probably many
>> contributors, but is less friendly for new folks who are less familiar with
>> the project IMO (especially when the filepath/command is less obvious).
>>
>
Agree to disagree :-). They are both the same words, just one is a soup and
one is a task name :-)

--end--


>
>> On Tue, Oct 10, 2023 at 12:29 PM Robert Burke <rob...@frantil.com> wrote:
>>
>>> +1 to the general proposal.
>>>
>>> I'm not bothered if something says a gradle command and in execution,
>>> that task ends up running multiple different commands. Arguably, if we're
>>> running a gradle task manualy to prepare for a subsequent task that latter
>>> task should be adding the former to it's dependencies.
>>>
>>> Also agree that this is a post Jenkins exit task. One migration in an
>>> area at a time please.
>>>
>>> On Tue, Oct 10, 2023, 8:07 AM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> On Tue, Oct 10, 2023 at 10:21 AM Danny McCormick via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> I'm +1 on:
>>>>> - standardizing our naming
>>>>> - making job names match their commands exactly (though I'd still like
>>>>> the `Run` prefix so that you can do things like say "Suite XYZ failed"
>>>>> without triggering the automation)
>>>>> - removing pre/postcommit from the naming (we actually already run
>>>>> many precommits as postcommits as well)
>>>>
>>>>
>>>> Fully agree with your point of keeping "Run" as the magic word and the
>>>> way we have it today.
>>>>
>>>> I'm -0 on:
>>>>
>>>>> - Doing this immediately - I'd prefer we wait til the Jenkins to
>>>>> Actions migration is done and we can do this in bulk versus renaming 
>>>>> things
>>>>> as we go since we're so close to the finish line and exact parity makes
>>>>> reviews easier.
>>>>
>>>>
>>>> Cool. And indeed this is why I didn't "just do it' (aside from this
>>>> being enough impact to people's daily lives that I wanted to get feedback
>>>> from dev@). If we can fold it in as a last step to the migration, that
>>>> would be a nice-to-have. Otherwise ping back when ready please :-)
>>>>
>>>> On Tue, Oct 10, 2023 at 11:15 AM Yi Hu via dev <dev@beam.apache.org>
>>>> wrote:
>>>>
>>>>> Thanks for raising this. This generally works, though some jobs run
>>>>> more than one gradle task (e.g. some IO_Direct_PreCommit run both :build
>>>>> (which executes unit tests) and :integrationTest).
>>>>>
>>>>
>>>> On Tue, Oct 10, 2023 at 10:21 AM Danny McCormick via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> I'm -1 on:
>>>>> - Tying the naming to gradle - like Yi mentioned some workflows have
>>>>> multiple gradle tasks, some don't have any, and I think that's ok.
>>>>>
>>>>
>>>> Just to clarify: I'm not proposing tying them to gradle tasks (I'm fine
>>>> with `go test` for example) or doing this in situations where it is
>>>> unnatural.
>>>>
>>>> My example probably confused this because I left off the `./gradlew`
>>>> just to save space. I'm proposing naming them after their obvious repro
>>>> command, wherever applicable. Mostly fixing stuff like how the status label
>>>> "Google Cloud Dataflow Runner V2 Java ValidatesRunner Tests (streaming)" is
>>>> literally a less useful way of writing
>>>> ":runners:google-cloud-dataflow-java:validatesRunnerV2Streaming".
>>>>
>>>> FWIW I think Yi's example demonstrates an anti-pattern (mixing
>>>> hermetic/reliable and non-hermetic/possibly-flaky tests in one test signal)
>>>> but I'm sure there are indeed jobs where this doesn't make sense.
>>>>
>>>> Kenn
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Danny
>>>>>
>>>>>
>>>>>>
>>>>>> Another option is to normalize the naming of every job, saying the
>>>>>> job name is X, then workflow name is PreCommit_X or PostCommit_X, and the
>>>>>> phrase is Run X. Currently most PreCommit follow this pattern, but there
>>>>>> are also many outliers. A good start could be to clean up all jobs
>>>>>> to follow the same pattern.
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 10, 2023 at 9:57 AM Kenneth Knowles <k...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> FWIW I aware of the README in
>>>>>>> https://github.com/apache/beam/tree/master/.test-infra/jenkins that
>>>>>>> lists the phrases alongside the jobs. This is just wasted work to 
>>>>>>> maintain
>>>>>>> IMO.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Tue, Oct 10, 2023 at 9:46 AM Kenneth Knowles <k...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> *Proposal:* make all the job names exactly match the GH comment to
>>>>>>>> run them and make it also as close as possible to how to reproduce 
>>>>>>>> locally
>>>>>>>>
>>>>>>>> *Example problems*:
>>>>>>>>
>>>>>>>>  - We have really silly redundant jobs results like 'Chicago Taxi
>>>>>>>> Example on Dataflow ("Run Chicago Taxi on Dataflow")' and
>>>>>>>> 'Python_Xlang_IO_Dataflow ("Run Python_Xlang_IO_Dataflow PostCommit")'
>>>>>>>>
>>>>>>>>  - We have jobs that there's no way you could guess the command
>>>>>>>> 'Google Cloud Dataflow Runner V2 Java ValidatesRunner Tests 
>>>>>>>> (streaming)'
>>>>>>>>
>>>>>>>>  - (nit) We are weirdly inconsistent about using spaces vs
>>>>>>>> underscores. I don't think any of our infrastructure cares about this.
>>>>>>>>
>>>>>>>> *Extra proposal*: make the job name also the local command, where
>>>>>>>> possible
>>>>>>>>
>>>>>>>> *Example: *
>>>>>>>> https://github.com/apache/beam/blob/master/.github/workflows/beam_PostCommit_Java_ValidatesRunner_Dataflow.yml
>>>>>>>>
>>>>>>>>  - This runs :runners:google-cloud-dataflow-java:validatesRunner
>>>>>>>>  - So make the status label
>>>>>>>> ":runners:google-cloud-dataflow-java:validatesRunner"
>>>>>>>>  - "Run :runners:google-cloud-dataflow-java:validatesRunner" as
>>>>>>>> comment
>>>>>>>>
>>>>>>>> If I want to run it locally, yes there are GCP things I have to set
>>>>>>>> up, but I know the gradle command now.
>>>>>>>>
>>>>>>>> *Corollary*: remove "postcommit" and "precommit" from names,
>>>>>>>> because whether a suite runs before merge or after merge is not a 
>>>>>>>> property
>>>>>>>> of the suite.
>>>>>>>>
>>>>>>>> *Caveats*: I haven't been that involved. I didn't do this to
>>>>>>>> Jenkins because they are going away. I didn't do anything to GHA 
>>>>>>>> because I
>>>>>>>> don't know if they are ready or in flux.
>>>>>>>>
>>>>>>>> I know this is the sort of thing that invites bikeshedding. It just
>>>>>>>> would save me a few minutes when puzzling out what to care about and 
>>>>>>>> how to
>>>>>>>> kick jobs on the release branch validation PR.
>>>>>>>>
>>>>>>>> I'm happy to scrape through the existing stuff and align it.
>>>>>>>> Perfect task for when my brain is too tired for other work.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>

Reply via email to