[ 
https://issues.apache.org/jira/browse/BEAM-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973335#comment-15973335
 ] 

Kenneth Knowles commented on BEAM-1197:
---------------------------------------

Just to zoom out: the first question is whether or not one wants correct join 
results, in which case past results must be reprocessed against refreshed side 
data.

As with other stream-stream join techniques, you'll want a user-specified way 
of setting a horizon on this. Using windows as the naive horizon (a) doesn't 
work for globally windowed joins, even using triggers and (b) still has a 
cartesian explosion so it could be included as a worst-case fallback only.

> Slowly-changing external data as a side input
> ---------------------------------------------
>
>                 Key: BEAM-1197
>                 URL: https://issues.apache.org/jira/browse/BEAM-1197
>             Project: Beam
>          Issue Type: Wish
>          Components: beam-model
>            Reporter: Eugene Kirpichov
>
> I've seen repeatedly the following pattern: a user wants to join a 
> PCollection against a slowly-changing external dataset: e.g. a file on GCS, 
> or a Bigtable, etc.
> Side inputs come to mind, but current side input mechanisms don't allow for 
> something like periodically reloading the side input.
> The best hacky solution I came up with for one use case is documented here: 
> http://stackoverflow.com/questions/41254028/can-dataflow-sideinput-be-updated-per-window-by-reading-a-gcs-bucket/41271159#41271159
>  , we need to do better than this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to