Github user revans2 commented on the pull request:
https://github.com/apache/storm/pull/765#issuecomment-143872114
I am just rebasing the code now so you can test that out yourself. This
code has no issues with acking. But there are a few real issues.
First the latency on low throughput queues is much higher. This is because
it has to wait for the batch to time out. That timeout is set to 1 ms by
default, so it is not that bad, but we should be able to do some on the fly
adjustments in a follow on JIRA to dynamically adjust the batch size for each
queue to compensate.
Second the number of threads used is a lot more. 1 more per disruptor
queue. I expect to reduce the total number of disruptor queues once I have
optimized other parts of the code. As it stands right now I don't want to do
that, because the two queues per bolt/spout design still improves performance
in many cases.
Third in the worst case situation it is possible to allocate many more
objects than previously. It is not actually that many more, we already
allocate a lot of objects, which needs to be looked at on a separate JIRA at
some point.
Also I don't want to shove this code in without doing a real comparison
between the two approaches and the code. This is one way of doing batching,
but there are others that may have advantages over this, or may compliment this
approach as well. I just want storm to eventually be a lot closer to the 1
million sentences/second mark than it is now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---