damccorm opened a new pull request #16980:
URL: https://github.com/apache/beam/pull/16980


   This is part 1 of 2 to add bundle finalization support to the Go Sdk.
   
   **Summary of Overall Changes**
   
   Bundle finalization enables a DoFn to perform side effects after a runner 
has acknowledged that it has durably persisted the output. Right now, Java and 
Python support bundle finalization by allowing a user to register a callback 
function which is invoked when the runner acknowledges that it has persisted 
all output, but Go does not have any such support. This is part of a larger 
change to add support to the Go Sdk as outlined in this [design 
doc](https://docs.google.com/document/d/1dLylt36oFhsWfyBaqPayYXqYHCICNrSZ6jmr51eqZ4k/edit#).
   
   I've completed all the changes (sans some better testing on the parts not in 
this PR), you can see the remaining files in the diff here - 
https://github.com/apache/beam/compare/master...damccorm:users/damccorm/bundle-finalization?expand=1
   
   **Summary of Changes in this PR**
   
   This PR adds most of the non user facing changes needed to enable this 
change. There are basically 3 major components:
   
   1. Changes to exec/pardo.go and exec/fn.go to add the bundleFinalizer type, 
pass it into the user's DoFn when appropriate, and invoke the callbacks on 
finalization.
   1. Changes to harness.go and plan.go to manage plans that require 
finalization and respond to the runner sending the bundle finalization message.
   1. Adding FinalizeBundle and GetBundleExpirationTime to the Unit interface 
and all structs implementing it. This allows the FinalizeBundle command to 
propogate through the execution graph and allows us to get the time we can 
expire a bundle everywhere in the graph respectively. This cascaded into a 
bunch of small changes that are responsible for most of the file changes in 
this PR
   
   I would recommend reviewing the PR files in the order I just mentioned.
   
   **Additional testing done**
   
   On top of the units added, using my full implementation (not just this 
partial one) I also was able to run an E2E example on Dataflow (FWIW, not all 
runners have finalization support but I found that Dataflow does). In that 
example, I hijacked the wordcount example and added a bundleFinalizer to write 
a file to persistent storage for each line that had at least 3 words (chosen 
pretty randomly to minimize the chances of collisions). I'll omit the whole 
sample since its long, but it produced a bunch of files like this:
   
   <img width="259" alt="image" 
src="https://user-images.githubusercontent.com/42773683/156254068-adab3a92-7978-4284-9963-8b80c8ecbf59.png";>
   
   This indeed ran after the other data was persisted
   
   **Next Steps**
   
   After this, I'll add the user facing functionality in a follow up pr, along 
with testing for that and 1+ integration test for the whole flow.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
    - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to