thanks again Bobby, that's exactly what i'm doing right now. i try to schedule the components in a way that reduces network latency and after a while based on resource usage scheduler tries to make wiser decisions.
On Thu, Aug 3, 2017 at 7:49 PM Bobby Evans <[email protected]> wrote: > It should, especially for the ackers. > The ackers receive lots and lots of small messages, and those messages > come from all over your topology. What is more if you have > max.spout.pending set how quickly the messages can get to them and back to > the spouts determines the throughput of your topology to some degree. But > it all depends on what actually is the bottleneck in your topologies. If > it is the network/network ping time, then scheduling all of the components > of your topology close to each other is important. If it is the CPU or > Memory then you need to spread them out more to get more free resources on > other nodes. This is kind of what RAS tries to do but it does it just from > guesses supplied by the topology owner. In future releases we expect to > add in elasticity to RAS so that it can look at the actual resources being > used and take that into account when scheduling, because each topology is > different. > > - Bobby > > > On Tuesday, August 1, 2017, 3:40:11 PM CDT, AMir Firouzi < > [email protected]> wrote: > > Thanks Bobby for your instant & informative reply, > i actually respect these rules. i schedule all of these loggers and ackers, > but right now my scheduler put all the system tasks(loggers and acker > tasks) into one worker in one machine and i'm not getting the best > performance! I think it's because all of the tasks should transfer data to > these tasks in another machines and network latency slows down the storm. > but i'm wondering if i put some of these system tasks near other > (bolt/spout) tasks, would it effect the performance? > thanks again for your answer. > > On Tue, Aug 1, 2017 at 6:20 PM Bobby Evans <[email protected]> > wrote: > > > By default there are no `_eventlogger` tasks. To have this feature > > enabled you need to turn it on by setting topology.eventlogger.executors > to > > a positive number. Ackers are on by default, but can be disabled by > > setting the number of topology.acker.executors to 0. You should respect > > these when scheduling a topology because if they are supposed to be there > > and they are not scheduled messages will be sent to them, but they will > be > > lost. In the case of acking all of the tuples will time out. In the > case > > of the event logger the UI will show it working, but nothing will ever > come > > out. > > Now that is on a per topology basis, not on a per worker basis. These > > bolts are like any other bolt. They can be in any worker your scheduler > > wants to put them in. When inserting an acker bolt it is using a keyed > > grouping connected to just about everything in your topology, so where > you > > place it is not that critical as it is going to be talking to everything. > > The event logger bolts are similar, but using a fields grouping based off > > of component id. > > > > > https://github.com/apache/storm/blob/4c8a986f519cdf3e63bed47e9c4f723e4867267a/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L346-L357 > > You could try to be smart to try and collocate the component with the > > logger for it, but honestly this feature slows your topology down so much > > already it is probably not worth trying to optimize it as it really will > > only be used when you need to do some serious debugging. > > > > > > - Bobby > > > > > > On Tuesday, August 1, 2017, 4:44:55 AM CDT, AMir Firouzi < > > [email protected]> wrote: > > > > hi guys > > i'm working on my own scheduler for storm. i wonder what happens if i > > create a worker process and put some tasks in it(bolt/spout tasks) but no > > _eventlogger and _acker tasks. what happens? is it a problem? tuples > > transferred/emitted from within tasks in this worker will be skipped or > > they just use another _acker or _loggers in other workers? > > > > thanks in advance > > >
