Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2241
@roshannaik I appreciate your last comment and trying to summarize the
concerns that have been raised.
> 1. Better handling of low throughput Topos.
Yes lower CPU usage and lower latency by default. If all this takes is
changing some default configs then lets do that. I am very concerned with
having something that requires a lot of manual tuning. Most users will not
know how to do it,end up copying and pasting something off of the internet and
get it wrong. That is why I was running my tests with out of the box
performance.
I also want to be sure that we pay attention to a mixed use case topology
like with DRPC queries. You may have one part of your topology that has high
throughput, aka the data path. And yet there is another part of the topology
(DRPC control/query path) that has very low throughput. Waiting seconds for a
DRPC query to fill a batch that will never fill is painful.
> 2. TVL topo: Able to run this ...
OK, but to me it was just an indication that something had changed
drastically and not necessarily in a good way. My big concern is not TVL. I
don't really care much about that (and we can discuss benchmark/testing
methodologies on a separate JIRA). It is that with STORM-2306 there are some
seriously counter intuitive situations (low throughput which you already called
out) and some really scary pitfalls (the CPU contention which was [mentioned
above](https://github.com/apache/storm/pull/2241#issuecomment-318494665)). But
I want to be sure that it is addressed in some way. As an end user I see that
one of my bolts is backed up, so I increase the parallelism and the performance
gets much worse with no indication at all in any logs or metrics why it got
worse. At a minimum we need a good way to know when this is happening, and
ideally have the performance degrade gracefully instead.
> 3. Bug in Multi worker mode prevents inter-worker communication.
I was wrong this works. I was just seeing messages time out because of the
original problem with the host being overloaded and interpreted it wrong.
> 5. Some "real-world topology" runs as in addition to benchmark style
topos.
Yes and preferably ones that are running on more then one machine. Ideally
some that have multiple different topologies running at the same time on the
same cluster too so we can see what happens when there is CPU contention. Also
I would like to add that it would be good to observe someone who has a
currently working topology try and run it under the new system. It might help
us see where we need better documentation or to adjust default settings.
...
> 6. get some more runs of TVL. I am happy to provide some of that. I
spent some time on it the past few days trying to understand better how this
patch compares to what is on master, but I'll put that in a separate post as
this is getting long already, and I may have to talk about benchmark
methodology some.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---