[
https://issues.apache.org/jira/browse/BEAM-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953835#comment-15953835
]
Kenneth Knowles commented on BEAM-197:
--------------------------------------
No, we should build a library transform and examples for incremental joins -
today with CoGBK it is impossible to use a speculative trigger and get correct
join results. This is one of the most important capabilities that state adds.
And since it is a bit advanced, the state blog post did not cover the topic.
I actually have a
[draft|https://github.com/kennknowles/beam/blob/stream-join/examples/java/src/main/java/org/apache/beam/examples/complete/StreamJoin.java]
already worked up for a join-matrix but it is more of a hacked example than a
library transform at the moment. I'm working on it now, a bit, to get the right
scaling behavior first before trying to polish any APIs.
> Incremental join
> ----------------
>
> Key: BEAM-197
> URL: https://issues.apache.org/jira/browse/BEAM-197
> Project: Beam
> Issue Type: Bug
> Components: beam-model
> Reporter: Mark Shields
> Assignee: Kenneth Knowles
>
> Consider a co-group by key over the two (streaming) collections:
> l : PCollection<KV<K, L>>
> r : PCollection<KV<K, R>>
> Each processElement sees a K, Iterable<L> and Iterable<R>.
> If the underlying trigger only allows a single PaneInfo.Timing.ON_TIME pane
> then it is trivial to calculate the traditional cross-product, including any
> of the inner/outer join combinations should Iterable<L> or Iterable<R> be
> empty.
> However if the underlying trigger supports speculative (ie
> PaneInfo.Timing.EARLY) or late (ie PaneInfo.Timing.LATE) panes then the
> corresponding speculative output panes are awkward to compute.
> (left_already_seen ++ new_left) X (right_already_seen ++ new_right)
> ==
> (left_already_seen X right_already_seen) ++
> (new_left X right_already_seen) ++
> (left_already_seen X new_right) ++
> (new_left X new_right)
> Currently the barrier between 'already seen' and 'new' must be maintained for
> left and right in per-window state. That suppresses some optimizations.
> This bug is for finding a cleaner way to express this combinator.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)