Github user revans2 commented on the pull request:
https://github.com/apache/storm/pull/526#issuecomment-93840072
So after much searching and tracing through logs, with some added logs in
the CoordinatedBolt I found out that the CoordinatedBolt was timing out the
batch in a few cases, if the batch took longer then 300ms to complete because
the timeout is set to 30 seconds by default and 10 seconds of simulated time
equals 100ms of wall time. When this would happen the bolts would be confused
and the batch would never be fully acked. I am not sure why the
coordinator/spout was not getting a timeout and replaying the batch in
simulated time, but because it is a simulated time issue, and only really shows
up on this one test, I decided to increase the timeout. If others think we
should dig deeper and understand why the replay is not happening I am happy to
hand the JIRA over to them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---