Hi Basti,

basically, our modification works similar if we try to change the number
of real threads with the range of tasks. Thereby we transfer the queues
if needed so that processing can directly go on. To get rid of workers,
we modified the shutdown function a bit. Typically this is rather fast
and the pause in processing is around 20ms - the 50ms were just an upper
limit we observed so far. Due to queue transfer there may be peeks in
throughput, but the more residual resources the less effects we noticed
(of course). However, so far we do not really take the state of the
thread into account when changing the parallelization, but we will try
to do this as this is anyway on the agenda of the project.

In addition, we can create threads across workers, i.e., if required
start a new worker on a different machine. Therefore we extended the
assignment a bit in order to define the intended sequence the workers
shall be modified. Basically, this is just a combination of
starting/stopping workers with starting/stopping threads and is
similarly fast. However, this modification might not be fully in line
with the Thrift view on the topology, but researchers try doing such
things ;)

So far, we did not touch the Nimbus or further parts as we tried to stay
as "compatible" as possible. Thus, the "control" for that is currently
written in Java and outside of Storm, from the Storm point of view of
course a downside.

Best,
Holger



On 16.12.2015 05:36, 刘键(Basti Liu) wrote:
> Hi Holger,
> 
> I am the developer of JStorm. We are also interested in the dynamical update 
> of parallelization of Bolts/Spouts at runtime. :-)
> Currently, JStorm supports both scale-out and scale-in of parallelization.
> Following is the brief description of the design in JStorm.
> 1. The basic motivation is trying to keep current assignment consistent to 
> ensure the impact on running workers as less as possible.
> 2. When receiving the new configuration of parallelization, nimbus scheduler 
> will update assignment and upload the new assignment onto zookeeper.
> 3. The supervisors get notification of watching, then try to create 
> create/kill the workers related to the update of parallelization.
> 4. Each worker update the connections to remote workers and create/shutdown 
> executor(tasks) according to new assignment when refreshing connection.
> Actually, the response to create/kill workers in supervisor is very quick, 
> since the notification from zookeeper is fast. But the refreshing connection 
> is 
> performed periodically in worker. Normally, it requires few seconds to finish 
> it. Besides it, for scale-in scenario, it might be better to deactivate 
> topology 
> for a while, then start the scheduling of new configuration of 
> parallelization. Because the action of updating connection state is 
> asynchronous among workers. 
> If without any delay, it is possible that a worker send tuples to the remote 
> executor(task) which is being shut down. It will cause the message loss, or 
> the jam 
> of netty server/receiving queue of downstream workers.
> So the total amount of time for scale-out/scale-in in JStorm is around 30 ~ 
> 45 seconds.
> I am not sure the definition of "50 ms" you mentioned below. Is it the time 
> to start/kill workers? Or the time of all workers back to service correctly?
> 
> Regards
> Basti
> 
> -----Original Message-----
> From: Holger Eichelberger [mailto:[email protected]] 
> Sent: Wednesday, December 16, 2015 6:19 AM
> To: Bobby Evans; [email protected]
> Subject: Re: Runtime adaptive storm
> 
> Bobby,
> 
> great, I will have a deeper look ad JStorm and the JIRA issue.
> 
> Holger
> 
> On 15.12.2015 16:08, Bobby Evans wrote:
>> Holger,
>>
>> Yes, I think it is something that we would be interested in.  The 
>> biggest thing would be working out the differences between your work, 
>> and the work that JStorm has also done that is similar.  We are in the 
>> process of merging with the JStorm project, and pulling in some of the 
>> feature work that they have done.
>>
>> I filed https://issues.apache.org/jira/browse/STORM-1330 to evaluate 
>> and port a very similar feature from JStorm, but if you or  wants to 
>> jump on the JIRA and collaborate I think it would be great.
>>  
>> - Bobby
>>
>>
>>
>> On Tuesday, December 15, 2015 8:37 AM, Holger Eichelberger 
>> <[email protected]> wrote:
>>
>>
>>
>> Hi,
>>
>> we are working in a funded project on adaptive data stream processing 
>> where we use Storm as a basis. Among other runtime changes to the data 
>> processing, we faced the problem that we cannot modify the 
>> parallelization of Bolts/Spouts or their execution location at runtime.
>> As the rebalance command of Storm is too slow for our purposes, we 
>> developed a proof-of-concept implementation, which allows us to 
>> perform the needed changes (less than 50 ms initial experiments).
>>
>> Now, we wondered whether this could be interesting for the Storm 
>> community, and Nathan suggested to just post the current state on the 
>> dev list in order to get into contact.
>>
>> Looking forward to read from you.
>>
>> Cheers,
>> Holger
>>
>>
> 
> 
> --
> ------------------------------------------------------------------------
> Dr. Holger Eichelberger
> Software Systems Engineering
> University of Hildesheim         Tel.: +49 (0) 5121 / 883-40334
> Institute of Computer Science    Fax.: +49 (0) 5121 / 883-40331
> Universitätsplatz 1              [email protected]
> D-31141 Hildesheim, Germany
> 
> http://www.uni-hildesheim.de/de/eichelberger.htm
> 
> PGP public key:
> 
> https://www.uni-hildesheim.de/media/fb4/informatik/AG_SSE/pgp/sse_he_pubkey.asc
> PGP fingerprint:
>   4522 685C 57E5 D0EA D5BA  4EDE C53F 5B9D 9F71 55E7
> 
> ------------------------------------------------------------------------
> 
> Adaptive Big Data Infrastructure for financial Risk Analysis (FP7) 
> http://qualimaster.eu
> 
> 1st Intl Big Data Processing - Reloaded Workshop 
> http://bdpr2016.l3s.uni-hannover.de/
> 
> ------------------------------------------------------------------------
> 


Reply via email to