[
https://issues.apache.org/jira/browse/BEAM-10976?focusedWorklogId=734968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734968
]
ASF GitHub Bot logged work on BEAM-10976:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Mar/22 21:46
Start Date: 01/Mar/22 21:46
Worklog Time Spent: 10m
Work Description: damccorm opened a new pull request #16980:
URL: https://github.com/apache/beam/pull/16980
This is part 1 of 2 to add bundle finalization support to the Go Sdk.
**Summary of Overall Changes**
Bundle finalization enables a DoFn to perform side effects after a runner
has acknowledged that it has durably persisted the output. Right now, Java and
Python support bundle finalization by allowing a user to register a callback
function which is invoked when the runner acknowledges that it has persisted
all output, but Go does not have any such support. This is part of a larger
change to add support to the Go Sdk as outlined in this [design
doc](https://docs.google.com/document/d/1dLylt36oFhsWfyBaqPayYXqYHCICNrSZ6jmr51eqZ4k/edit#).
I've completed all the changes (sans some better testing on the parts not in
this PR), you can see the remaining files in the diff here -
https://github.com/apache/beam/compare/master...damccorm:users/damccorm/bundle-finalization?expand=1
**Summary of Changes in this PR**
This PR adds most of the non user facing changes needed to enable this
change. There are basically 3 major components:
1. Changes to exec/pardo.go and exec/fn.go to add the bundleFinalizer type,
pass it into the user's DoFn when appropriate, and invoke the callbacks on
finalization.
1. Changes to harness.go and plan.go to manage plans that require
finalization and respond to the runner sending the bundle finalization message.
1. Adding FinalizeBundle and GetBundleExpirationTime to the Unit interface
and all structs implementing it. This allows the FinalizeBundle command to
propogate through the execution graph and allows us to get the time we can
expire a bundle everywhere in the graph respectively. This cascaded into a
bunch of small changes that are responsible for most of the file changes in
this PR
I would recommend reviewing the PR files in the order I just mentioned.
**Additional testing done**
On top of the units added, using my full implementation (not just this
partial one) I also was able to run an E2E example on Dataflow (FWIW, not all
runners have finalization support but I found that Dataflow does). In that
example, I hijacked the wordcount example and added a bundleFinalizer to write
a file to persistent storage for each line that had at least 3 words (chosen
pretty randomly to minimize the chances of collisions). I'll omit the whole
sample since its long, but it produced a bunch of files like this:
<img width="259" alt="image"
src="https://user-images.githubusercontent.com/42773683/156254068-adab3a92-7978-4284-9963-8b80c8ecbf59.png">
This indeed ran after the other data was persisted
**Next Steps**
After this, I'll add the user facing functionality in a follow up pr, along
with testing for that and 1+ integration test for the whole flow.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
To check the build health, please visit
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 734968)
Remaining Estimate: 0h
Time Spent: 10m
> Enable Bundle Finalization in Go SDK
> ------------------------------------
>
> Key: BEAM-10976
> URL: https://issues.apache.org/jira/browse/BEAM-10976
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-go
> Reporter: Robert Burke
> Assignee: Danny McCormick
> Priority: P3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Eg. to support acking pubsub/kafka messages as processed after the results
> have been properly committed by the runner.
> Note, that due to BEAM-10959 that when implementing this, an instruction must
> remain "active" until it's finalization occurs as well. Specifically, we
> should probably keep another map around for "to be finalized" process bundle
> instructions so we can return the appropriate "empty" response and not
> accidently evict them from the nearly equivalent inactive state until after
> finalization.
> [https://s.apache.org/beam-finalizing-bundles]
>
> (To be updated once [https://github.com/apache/beam/pull/13160] is merged and
> the programming guide updated with SDF content.)
> See also Java and Python approaches
> https://beam.apache.org/documentation/programming-guide/#bundle-finalization
--
This message was sent by Atlassian Jira
(v8.20.1#820001)