lostluck opened a new pull request, #27550:
URL: https://github.com/apache/beam/pull/27550
Umbrella PR to make prism the default runner. Will be sending out smaller
slices of this separately for review.
This fixes up several paper cuts WRT the prism runner vs the direct runner
in Go SDK code. As well as fixing up a few issues WRT built-ins running on
portable runners.
Critically, it avoids discrepancies between the direct runner implementation
and the expectations of other (portable) runners. While this does make things
slightly harder to get started, it homogenizes the experience when moving to
production beam runners from testing, which will make documenting correct
procedures consistent, and provide consistent test implementations.
* Sets prism to be the default runner for the Go SDK.
* Replace direct uses of `direct` package with prism.
* Mark the direct runner package as Deprecated.
* TODO: Provide clear instructions for what to do/use instead.
* TODO: Run all the examples with prism.
* Update the SDK to require Go 1.20
* Due to prism's exp/slog dependency, which will be migrated to the
standard library version with go 1.21.
* Adds a prism stanza for the integration tests
* TODO: Add jenkin's target and run as a precommit (like the Python
Portable runs)
* Ensure all unit tested DoFns are registered. Update some things to use the
generic registration package. Outstanding work to do a full conversion to the
register package.
* Add calls to ptest.Main in TestMains for package unit tests that use ptest.
* The datastore and fhirio packages stay on the "direct" runner because they
don't have portable testing implementations. Those issues have been filed
seperately. This only pertains to their ability to test.
* TODO: Fix top.accum to not assume it's in a fused context.
* Update the debug string for ParDo execution nodes to list side inputs.
* Update the debug string side inputs to list coder.
* Clarify harness ProcessBundle handling as the error source when the
failure from materializing ProcessBundleDescriptors to exec.Plans.
* Ensure data handling errors that end up at harness are actually reported
as errors that fail a bundle.
* Update the universal runner to track the last error message over the
stream, and return that as the pipeline failure message. Alternatively, wait
until the stream terminates or an error message is received after the JobState
is Failed. This allows prism (and other portable runners) to produce the
failure cause cleanly to the launching task (if it's still up).
* Have the universal runner wrapper avoid the Go binary compile step when
the execution is set to Loopback mode. Generates a temporary empty file and
sets that as the "worker_binary". This avoids waiting on compiles between each
test, when the binary would not even be used anyway.
* Adds "Loopback" as a field option to the runnerlib package.
* Prism Fixes
* Ensuring coders for SideInputs are properly set if their PCollection
coder is replaced. Otherwise this would lead to problems on decoding.
* Moves test DoFns into a internal_test package. Otherwise there's a
circular dependency with importing prism into ptest for executing pipelines
with prism as default.
* Track the "pipeline terminating failure" and ensure that is logged as an
Error class message over the message stream.
* Have SDK errors on Progress or Split abort the progress loop. Doesn't
abort bundle processing, but prevents receiving tentative results. Prevents
non-termination of prism jobs on failures, when stuck in the loop.
* Since SDK errors are now relayed back to the launching process as log
messages, no longer log them at the prism end. Removes verbose duplicate
messages.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] Mention the appropriate issue in your description (for example:
`addresses #123`), if applicable. This will automatically add a link to the
pull request in the issue. If you would like the issue to automatically close
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
To check the build health, please visit
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]