[
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852940#comment-15852940
]
Sam McVeety commented on BEAM-758:
----------------------------------
After further conversations, it seems that ProcessContext is a more desirable
place to put this. As an amended proposal, what do folks think about proving
the following in ProcessContext:
/** Provides a nonce that is unique and stable for this job execution instance.
**/
String getJobNonce();
/** Provides a nonce that is unique and stable for this step. **/
String getStepId();
Between the two, these allow for both stable, shared values across multiple
steps as needed, as well as step-unique values.
> Per-step, per-execution nonce
> -----------------------------
>
> Key: BEAM-758
> URL: https://issues.apache.org/jira/browse/BEAM-758
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Affects Versions: Not applicable
> Reporter: Daniel Halperin
> Assignee: Sam McVeety
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random
> value (nonce). These values are typically generated at apply time, so that
> they are deterministic (don't change across retries of DoFns) and global (are
> the same across all workers).
> However, once the runner API lands the existing code would result in the same
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input.
> Should work, as along as side inputs are actually checkpointed. But does not
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded
> pipelines -- we probably need one. This would enable us to, e.g., use
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once
> triggering per window]. Or generalizing to multiple firings...
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)