[ https://issues.apache.org/jira/browse/STORM-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181944#comment-16181944 ]
Arun Mahadevan commented on STORM-2761: --------------------------------------- As per my understanding the tuples are buffered in both streams and joined only once when the window triggers. e.g with a 1 min window all tuples that arrived in the last 1 min in "stream1" is joined with all the tuples that arrived in the last 1 min in "stream2" when the 1 min completes. If it does not work that way there might be a bug. cc [~roshan_naik] > JoinBolt.java 's paradigm is new model of stream join? > ------------------------------------------------------ > > Key: STORM-2761 > URL: https://issues.apache.org/jira/browse/STORM-2761 > Project: Apache Storm > Issue Type: Question > Components: storm-client > Reporter: Fei Pan > Priority: Critical > > Hi, I am a researcher from University of Toronto and I am studying > acceleration on stream processing platform. I have a question about the model > of window-based stream join used in the JoinBolt.java. From my understanding, > when a new tuple arrived, we join this new tuple with all the tuples in the > window of the opposite stream. However, in the JoinBolt.java, not only the > new tuple, but the tuples in the entire local window will join with the > window of the opposite stream. This actually produces a lot of duplicated > results, since most of the old tuples in the local window have joined before. > I don't know if this is a new paradigm or the storm's team misunderstood the > model of stream join. Can someone help me to clarify this question? -- This message was sent by Atlassian JIRA (v6.4.14#64029)