It should, especially for the ackers. The ackers receive lots and lots of small messages, and those messages come from all over your topology. What is more if you have max.spout.pending set how quickly the messages can get to them and back to the spouts determines the throughput of your topology to some degree. But it all depends on what actually is the bottleneck in your topologies. If it is the network/network ping time, then scheduling all of the components of your topology close to each other is important. If it is the CPU or Memory then you need to spread them out more to get more free resources on other nodes. This is kind of what RAS tries to do but it does it just from guesses supplied by the topology owner. In future releases we expect to add in elasticity to RAS so that it can look at the actual resources being used and take that into account when scheduling, because each topology is different.
- Bobby On Tuesday, August 1, 2017, 3:40:11 PM CDT, AMir Firouzi <[email protected]> wrote: Thanks Bobby for your instant & informative reply, i actually respect these rules. i schedule all of these loggers and ackers, but right now my scheduler put all the system tasks(loggers and acker tasks) into one worker in one machine and i'm not getting the best performance! I think it's because all of the tasks should transfer data to these tasks in another machines and network latency slows down the storm. but i'm wondering if i put some of these system tasks near other (bolt/spout) tasks, would it effect the performance? thanks again for your answer. On Tue, Aug 1, 2017 at 6:20 PM Bobby Evans <[email protected]> wrote: > By default there are no `_eventlogger` tasks. To have this feature > enabled you need to turn it on by setting topology.eventlogger.executors to > a positive number. Ackers are on by default, but can be disabled by > setting the number of topology.acker.executors to 0. You should respect > these when scheduling a topology because if they are supposed to be there > and they are not scheduled messages will be sent to them, but they will be > lost. In the case of acking all of the tuples will time out. In the case > of the event logger the UI will show it working, but nothing will ever come > out. > Now that is on a per topology basis, not on a per worker basis. These > bolts are like any other bolt. They can be in any worker your scheduler > wants to put them in. When inserting an acker bolt it is using a keyed > grouping connected to just about everything in your topology, so where you > place it is not that critical as it is going to be talking to everything. > The event logger bolts are similar, but using a fields grouping based off > of component id. > > https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357 > You could try to be smart to try and collocate the component with the > logger for it, but honestly this feature slows your topology down so much > already it is probably not worth trying to optimize it as it really will > only be used when you need to do some serious debugging. > > > - Bobby > > > On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi < > [email protected]> wrote: > > hi guys > i'm working on my own scheduler for storm. i wonder what happens if i > create a worker process and put some tasks in it(bolt/spout tasks) but no > _eventlogger and _acker tasks. what happens? is it a problem? tuples > transferred/emitted from within tasks in this worker will be skipped or > they just use another _acker or _loggers in other workers? > > thanks in advance >
