Hi Basti, basically, our modification works similar if we try to change the number of real threads with the range of tasks. Thereby we transfer the queues if needed so that processing can directly go on. To get rid of workers, we modified the shutdown function a bit. Typically this is rather fast and the pause in processing is around 20ms - the 50ms were just an upper limit we observed so far. Due to queue transfer there may be peeks in throughput, but the more residual resources the less effects we noticed (of course). However, so far we do not really take the state of the thread into account when changing the parallelization, but we will try to do this as this is anyway on the agenda of the project.
In addition, we can create threads across workers, i.e., if required start a new worker on a different machine. Therefore we extended the assignment a bit in order to define the intended sequence the workers shall be modified. Basically, this is just a combination of starting/stopping workers with starting/stopping threads and is similarly fast. However, this modification might not be fully in line with the Thrift view on the topology, but researchers try doing such things ;) So far, we did not touch the Nimbus or further parts as we tried to stay as "compatible" as possible. Thus, the "control" for that is currently written in Java and outside of Storm, from the Storm point of view of course a downside. Best, Holger On 16.12.2015 05:36, 刘键(Basti Liu) wrote: > Hi Holger, > > I am the developer of JStorm. We are also interested in the dynamical update > of parallelization of Bolts/Spouts at runtime. :-) > Currently, JStorm supports both scale-out and scale-in of parallelization. > Following is the brief description of the design in JStorm. > 1. The basic motivation is trying to keep current assignment consistent to > ensure the impact on running workers as less as possible. > 2. When receiving the new configuration of parallelization, nimbus scheduler > will update assignment and upload the new assignment onto zookeeper. > 3. The supervisors get notification of watching, then try to create > create/kill the workers related to the update of parallelization. > 4. Each worker update the connections to remote workers and create/shutdown > executor(tasks) according to new assignment when refreshing connection. > Actually, the response to create/kill workers in supervisor is very quick, > since the notification from zookeeper is fast. But the refreshing connection > is > performed periodically in worker. Normally, it requires few seconds to finish > it. Besides it, for scale-in scenario, it might be better to deactivate > topology > for a while, then start the scheduling of new configuration of > parallelization. Because the action of updating connection state is > asynchronous among workers. > If without any delay, it is possible that a worker send tuples to the remote > executor(task) which is being shut down. It will cause the message loss, or > the jam > of netty server/receiving queue of downstream workers. > So the total amount of time for scale-out/scale-in in JStorm is around 30 ~ > 45 seconds. > I am not sure the definition of "50 ms" you mentioned below. Is it the time > to start/kill workers? Or the time of all workers back to service correctly? > > Regards > Basti > > -----Original Message----- > From: Holger Eichelberger [mailto:[email protected]] > Sent: Wednesday, December 16, 2015 6:19 AM > To: Bobby Evans; [email protected] > Subject: Re: Runtime adaptive storm > > Bobby, > > great, I will have a deeper look ad JStorm and the JIRA issue. > > Holger > > On 15.12.2015 16:08, Bobby Evans wrote: >> Holger, >> >> Yes, I think it is something that we would be interested in. The >> biggest thing would be working out the differences between your work, >> and the work that JStorm has also done that is similar. We are in the >> process of merging with the JStorm project, and pulling in some of the >> feature work that they have done. >> >> I filed https://issues.apache.org/jira/browse/STORM-1330 to evaluate >> and port a very similar feature from JStorm, but if you or wants to >> jump on the JIRA and collaborate I think it would be great. >> >> - Bobby >> >> >> >> On Tuesday, December 15, 2015 8:37 AM, Holger Eichelberger >> <[email protected]> wrote: >> >> >> >> Hi, >> >> we are working in a funded project on adaptive data stream processing >> where we use Storm as a basis. Among other runtime changes to the data >> processing, we faced the problem that we cannot modify the >> parallelization of Bolts/Spouts or their execution location at runtime. >> As the rebalance command of Storm is too slow for our purposes, we >> developed a proof-of-concept implementation, which allows us to >> perform the needed changes (less than 50 ms initial experiments). >> >> Now, we wondered whether this could be interesting for the Storm >> community, and Nathan suggested to just post the current state on the >> dev list in order to get into contact. >> >> Looking forward to read from you. >> >> Cheers, >> Holger >> >> > > > -- > ------------------------------------------------------------------------ > Dr. Holger Eichelberger > Software Systems Engineering > University of Hildesheim Tel.: +49 (0) 5121 / 883-40334 > Institute of Computer Science Fax.: +49 (0) 5121 / 883-40331 > Universitätsplatz 1 [email protected] > D-31141 Hildesheim, Germany > > http://www.uni-hildesheim.de/de/eichelberger.htm > > PGP public key: > > https://www.uni-hildesheim.de/media/fb4/informatik/AG_SSE/pgp/sse_he_pubkey.asc > PGP fingerprint: > 4522 685C 57E5 D0EA D5BA 4EDE C53F 5B9D 9F71 55E7 > > ------------------------------------------------------------------------ > > Adaptive Big Data Infrastructure for financial Risk Analysis (FP7) > http://qualimaster.eu > > 1st Intl Big Data Processing - Reloaded Workshop > http://bdpr2016.l3s.uni-hannover.de/ > > ------------------------------------------------------------------------ >
