[ 
https://issues.apache.org/jira/browse/BEAM-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949493#comment-15949493
 ] 

Kenneth Knowles commented on BEAM-1002:
---------------------------------------

I also think there is a case for state-like APIs here. Interestingly (maybe not 
surprisingly) partitioning by window matters, while partitioning per key is 
not. A bit of a flavor of the fact that "keyed state" is only keyed to give 
some stable granularity for parallelism, while it is windowed for correctness.

This might be far from optimal in terms of pithiness, but here is the minimal 
deviation from existing state API:

{code}
new DoFn<NotAKV, Whatever>() {
  @TransientStateId("fizzle")
  private final StateSpec<ValueState<MySideType>> globalSpec = ...

  /* lazily read the side input when the transient state is gone and write it 
to the state */
}
{code}

We would discard it when the instance goes away (it might be corrupted, so we 
have to). Naively, it is mostly just a convenience on top of a {{Map}} in an 
instance field, is it not?

Differences I can see:
 - Less error prone, or course
 - Could conceivably allow clearing it early, if we spec it out to be that way
 - For keyed merging windows we could insert the needed GBK for correctness

> Enable caching of side-input dependent computations
> ---------------------------------------------------
>
>                 Key: BEAM-1002
>                 URL: https://issues.apache.org/jira/browse/BEAM-1002
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Robert Bradshaw
>            Assignee: Kenneth Knowles
>
> Sometimes the kind of computations one wants to perform in startBundle depend 
> on side inputs (and, implicitly, the window). For example, one might want to 
> initialize a (non-serializable) stateful object. In particular, this leads to 
> users incorrectly (in the case of triggered or non-globally-windowed side 
> inputs) memoizing this computation in the first processElement call. 
> One option would be to fold this into a customizable ViewFn. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to