[
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169365#comment-17169365
]
Beam JIRA Bot commented on BEAM-758:
------------------------------------
This issue is P2 but has been unassigned without any comment for 60 days so it
has been labeled "stale-P2". If this issue is still affecting you, we care!
Please comment and remove the label. Otherwise, in 14 days the issue will be
moved to P3.
Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed
explanation of what these priorities mean.
> Per-step, per-execution nonce
> -----------------------------
>
> Key: BEAM-758
> URL: https://issues.apache.org/jira/browse/BEAM-758
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Affects Versions: Not applicable
> Reporter: Dan Halperin
> Priority: P2
> Labels: stale-P2
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random
> value (nonce). These values are typically generated at apply time, so that
> they are deterministic (don't change across retries of DoFns) and global (are
> the same across all workers).
> However, once the runner API lands the existing code would result in the same
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input.
> Should work, as along as side inputs are actually checkpointed. But does not
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded
> pipelines -- we probably need one. This would enable us to, e.g., use
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once
> triggering per window]. Or generalizing to multiple firings...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)