[ 
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169365#comment-17169365
 ] 

Beam JIRA Bot commented on BEAM-758:
------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it 
has been labeled "stale-P2". If this issue is still affecting you, we care! 
Please comment and remove the label. Otherwise, in 14 days the issue will be 
moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed 
explanation of what these priorities mean.


> Per-step, per-execution nonce
> -----------------------------
>
>                 Key: BEAM-758
>                 URL: https://issues.apache.org/jira/browse/BEAM-758
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>    Affects Versions: Not applicable
>            Reporter: Dan Halperin
>            Priority: P2
>              Labels: stale-P2
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON 
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random 
> value (nonce). These values are typically generated at apply time, so that 
> they are deterministic (don't change across retries of DoFns) and global (are 
> the same across all workers).
> However, once the runner API lands the existing code would result in the same 
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input. 
> Should work, as along as side inputs are actually checkpointed. But does not 
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated 
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and 
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but 
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded 
> pipelines -- we probably need one. This would enable us to, e.g., use 
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once 
> triggering per window]. Or generalizing to multiple firings...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to