Github user mjsax commented on the pull request: https://github.com/apache/storm/pull/694#issuecomment-134656876 I just checked some older benchmark result doing batching in user land, ie, on top of Storm (=> Aeolus). For this case, a batch size of 100 increased the spout output rate by a factor of 6 (instead of 1.5 as the benchmark above shows). The benchmark should yield more than 70M tuples per 30 seconds... (and not about 19M). Of course, batching is done a little different now. In Aeolus, a fat-tuple is used as batch. Thus, the system sees only a single batch-tuple. Now, the system sees all tuples, but emitting is delayed until the batch is full (this still saved the overhead of going through the disruptor for each tuple). However, we generate a tuple-ID for each tuple in the batch, instead of a single ID per batch. Not sure how expensive this is. Because acking was not enabled, it should not be too expensive, because the IDs have not to be "registered" at the ackers (right?). As a further optimization, it might be a good idea not to batch whole tuples, but only `Values` and tuple-id. The `worker-context`, `task-id`, and `outstream-id` is the same for all tuples within a batch. I will try this out, and push a new version the next days if it works.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---