[
https://issues.apache.org/jira/browse/STORM-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181944#comment-16181944
]
Arun Mahadevan commented on STORM-2761:
---------------------------------------
As per my understanding the tuples are buffered in both streams and joined only
once when the window triggers.
e.g with a 1 min window all tuples that arrived in the last 1 min in "stream1"
is joined with all the tuples that arrived in the last 1 min in "stream2" when
the 1 min completes. If it does not work that way there might be a bug.
cc [~roshan_naik]
> JoinBolt.java 's paradigm is new model of stream join?
> ------------------------------------------------------
>
> Key: STORM-2761
> URL: https://issues.apache.org/jira/browse/STORM-2761
> Project: Apache Storm
> Issue Type: Question
> Components: storm-client
> Reporter: Fei Pan
> Priority: Critical
>
> Hi, I am a researcher from University of Toronto and I am studying
> acceleration on stream processing platform. I have a question about the model
> of window-based stream join used in the JoinBolt.java. From my understanding,
> when a new tuple arrived, we join this new tuple with all the tuples in the
> window of the opposite stream. However, in the JoinBolt.java, not only the
> new tuple, but the tuples in the entire local window will join with the
> window of the opposite stream. This actually produces a lot of duplicated
> results, since most of the old tuples in the local window have joined before.
> I don't know if this is a new paradigm or the storm's team misunderstood the
> model of stream join. Can someone help me to clarify this question?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)