I have been working on the protocol for splitting/checkpointing of bundles
for usage with SplittableDoFn but in the mean time wanted to share a
proposal for bundle finalization[1].

Bundle finalization is used to solve a problem where integration with
external systems which require acknowledgement (such as queue based
sources) should only be done when the output of a bundle is durably
persisted. The idea is that after a bundle is completed and the runner
durably persists the output a best effort finalization call is made back to
the same SDK harness instance. This allows the SDK harness to send any
"acknowledgements" to the external system. Any failures during finalization
require the external system to be able to restore anything which wasn't
acknowledged.

I also discuss why I don't believe we gain much by providing "guaranteed"
finalization. Please take a look at the doc I shared and feel free to
comment.

1: https://s.apache.org/beam-finalizing-bundles

Reply via email to