[
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546169#comment-17546169
]
Kenneth Knowles commented on BEAM-758:
--------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/18054
> Per-step, per-execution nonce
> -----------------------------
>
> Key: BEAM-758
> URL: https://issues.apache.org/jira/browse/BEAM-758
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Affects Versions: Not applicable
> Reporter: Dan Halperin
> Priority: P3
> Labels: Clarified
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random
> value (nonce). These values are typically generated at apply time, so that
> they are deterministic (don't change across retries of DoFns) and global (are
> the same across all workers).
> However, once the runner API lands the existing code would result in the same
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input.
> Should work, as along as side inputs are actually checkpointed. But does not
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded
> pipelines -- we probably need one. This would enable us to, e.g., use
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once
> triggering per window]. Or generalizing to multiple firings...
--
This message was sent by Atlassian Jira
(v8.20.7#820007)