[
https://issues.apache.org/jira/browse/BEAM-10976?focusedWorklogId=738317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-738317
]
ASF GitHub Bot logged work on BEAM-10976:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Mar/22 20:35
Start Date: 08/Mar/22 20:35
Worklog Time Spent: 10m
Work Description: damccorm opened a new pull request #17045:
URL: https://github.com/apache/beam/pull/17045
This is a continuation of the effort to add Bundle Finalization started in
#16980
**Summary of Overall Changes**
Bundle finalization enables a DoFn to perform side effects after a runner
has acknowledged that it has durably persisted the output. Right now, Java and
Python support bundle finalization by allowing a user to register a callback
function which is invoked when the runner acknowledges that it has persisted
all output, but Go does not have any such support. This is part of a larger
change to add support to the Go Sdk as outlined in this [design
doc](https://docs.google.com/document/d/1dLylt36oFhsWfyBaqPayYXqYHCICNrSZ6jmr51eqZ4k/edit#).
**Summary of Changes in this PR**
I completed most of the execution changes in the last pr, leaving this PR to
handle the user experience and plumbing the user's bundle finalization
parameter through to the execution layer.
**Additional testing done**
On top of the units added, I also was able to run an E2E example on Dataflow
(FWIW, [only Dataflow currently has bundle finalization
support](https://beam.apache.org/documentation/runners/capability-matrix/bounded-splittable-dofn-support-status/)).
In that example, I hijacked the wordcount example and added a bundleFinalizer
to write a file to persistent storage for each line that had at least 3 words
(chosen pretty randomly to minimize the chances of collisions). I'll omit the
whole sample since its long, but it produced a bunch of files like this:
<img width="259" alt="image"
src="https://user-images.githubusercontent.com/42773683/156254068-adab3a92-7978-4284-9963-8b80c8ecbf59.png">
This indeed ran after the other data was persisted.
I decided not to add an integration test because of the complexity involved.
Because only the dataflow runner supports finalization, any integration test
would require finding some way to (a) modify dataflow state and query that
state (probably not a great option since it requires modifying a devs personal
GCP account, and its not obvious what the best thing to do actually would be
without knowing more about their config), (b) creating some sort of local
endpoint for dataflow to talk back to (technically feasible, but definitely
non-trivial - would also add significant complexity outside of what we're
actually testing), or (c) using some 3rd party to communicate between the 2
(non-ideal since it adds an extra dependency that isn't part of what's being
tested just for this scenario). I'm definitely open to doing this, but at the
moment with the information I have it doesn't feel worth it.
**Next Steps**
After this, I'll update the documentation here to include an example
https://github.com/apache/beam/blob/6438626c059c19ff9ca32cd834d0aa62253e531b/website/www/site/content/en/documentation/programming-guide.md#127-bundle-finalization-bundle-finalization
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
To check the build health, please visit
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 738317)
Time Spent: 7h 10m (was: 7h)
> Enable Bundle Finalization in Go SDK
> ------------------------------------
>
> Key: BEAM-10976
> URL: https://issues.apache.org/jira/browse/BEAM-10976
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-go
> Reporter: Robert Burke
> Assignee: Danny McCormick
> Priority: P3
> Time Spent: 7h 10m
> Remaining Estimate: 0h
>
> Eg. to support acking pubsub/kafka messages as processed after the results
> have been properly committed by the runner.
> Note, that due to BEAM-10959 that when implementing this, an instruction must
> remain "active" until it's finalization occurs as well. Specifically, we
> should probably keep another map around for "to be finalized" process bundle
> instructions so we can return the appropriate "empty" response and not
> accidently evict them from the nearly equivalent inactive state until after
> finalization.
> [https://s.apache.org/beam-finalizing-bundles]
>
> (To be updated once [https://github.com/apache/beam/pull/13160] is merged and
> the programming guide updated with SDF content.)
> See also Java and Python approaches
> https://beam.apache.org/documentation/programming-guide/#bundle-finalization
--
This message was sent by Atlassian Jira
(v8.20.1#820001)