[ https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736622#comment-15736622 ]
Daniel Halperin commented on BEAM-758: -------------------------------------- This indeed seems like a step along the right track. We also need this to be per-step. (Aka, so two different instances (A & B) of a transform in the pipeline consistently use the same ('A nonce' and 'B nonce') across machines, and 'A nonce' != 'B nonce'. > Per-step, per-execution nonce > ----------------------------- > > Key: BEAM-758 > URL: https://issues.apache.org/jira/browse/BEAM-758 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Affects Versions: Not applicable > Reporter: Daniel Halperin > Assignee: Sam McVeety > > In the forthcoming runner API, a user will be able to save a pipeline to JSON > and then run it repeatedly. > Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random > value (nonce). These values are typically generated at apply time, so that > they are deterministic (don't change across retries of DoFns) and global (are > the same across all workers). > However, once the runner API lands the existing code would result in the same > nonce being reused across jobs. Other possible solutions: > * Generate nonce in {{Create(1) | ParDo}} then use this as a side input. > Should work, as along as side inputs are actually checkpointed. But does not > work for {{BoundedSource}}. > * If a nonce is only needed for the lifetime of one bundle, can be generated > in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}]. > * Add some context somewhere that lets user code access unique step name, and > somehow generate a nonce consistently e.g. by hashing. Will usually work, but > this is similarly not available to sources. > Another Q: I'm not sure we have a good way to generate nonces in unbounded > pipelines -- we probably need one. This would enable us to, e.g., use > {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once > triggering per window]. Or generalizing to multiple firings... -- This message was sent by Atlassian JIRA (v6.3.4#6332)