[
https://issues.apache.org/jira/browse/STORM-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720912#comment-14720912
]
Derek Dagit commented on STORM-996:
-----------------------------------
After discussion with [~revans2] and [~knusbaum], I am looking at correcting
out-of-order messages on the receiving side by adding sequence numbers to each
batch.
Meanwhile, on [~kishorvpatil]'s suggestion, I looked at using the
[OrderedDownstreamThreadPoolExecutor|http://netty.io/3.9/api/org/jboss/netty/handler/execution/OrderedDownstreamThreadPoolExecutor.html]
where we initialize the Netty
[Context|https://github.com/apache/storm/blob/07e0ff2e206a0e0ca96da9137ebbd3fc7a5e9c8b/storm-core/src/jvm/backtype/storm/messaging/netty/Context.java#L58-L64].
This approach could have some impact to performance, as it guarantees that
downstream events, per-channel, are ordered when sending. I tried it out and
found that the batch test still fails with individual messages appearing out of
order. The reproducability is similar to testing without changes.
{code}
<failure>expected: (= req_msg resp_msg)
actual: (not (= "64648" "64649"))
at: test_runner.clj:105</failure>
<failure>expected: (= req_msg resp_msg)
actual: (not (= "64649" "64648"))
at: test_runner.clj:105</failure>
<failure>expected: (= req_msg resp_msg)
actual: (not (= "68841" "68842"))
at: test_runner.clj:105</failure>
<failure>expected: (= req_msg resp_msg)
actual: (not (= "68842" "68841"))
at: test_runner.clj:105</failure>
{code}
> netty-unit-tests/test-batch demonstrates out-of-order delivery
> --------------------------------------------------------------
>
> Key: STORM-996
> URL: https://issues.apache.org/jira/browse/STORM-996
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 0.10.0
> Reporter: Derek Dagit
> Assignee: Derek Dagit
> Priority: Blocker
>
> backtype.storm.messaging.netty-unit-test/test-batch
> One example of output. Similar things happen sporadically and vary widely by
> number of failed assertions.
> Tuples are not just skewed, but actually seem to come in out-of-order.
> {quote}
> actual: (not (= "66040" "66041"))
> at: test_runner.clj:105
> expected: (= req_msg resp_msg)
> actual: (not (= "66041" "66042"))
> at: test_runner.clj:105
> expected: (= req_msg resp_msg)
> actual: (not (= "66042" "66040"))
> at: test_runner.clj:105
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)