Hey Yong, No problem. It is conceivable to add the feature you're talking about (or even full-blown auto-scaling), but we haven't felt a pressing need for it right now.
The reason that we don't have this feature is that Samza's partitions are statically assigned to a container. For example, if you run two containers, and four partitions, container 0 gets partitions 0 and 2, and container 1 gets partitions 1 and 3. If you were to then add a third container, the partition assignments change, since the new container needs to take control of some partitions. Right now, Samza doesn't support dynamically relocating partitions in a container. This was a conscious decision to reduce operational complexity (to avoid requiring ZooKeeper in containers). Without dynamic partition reassignment, the Samza ApplicationMaster (AM) has to kill a container, then start it again with new partitions. Thus, to add a new container to a running job, the AM would first kill all containers, re-assign partitions so that the new container gets some partitions, and then start all the containers up again. This is nearly the same operation as just restarting the job. We opted to force a job restart because of this. Furthermore, by restarting the Samza job, you get the added benefit that a job restart requires a config change to update the yarn.container.count config value, which means you get all the nice things that come with updating configuration (seeing who made the change, seeing when the change was made, seeing the current value, etc). This stuff is really useful in a production environment when multiple parties are involved. Cheers, Chris From: Yong Yuan <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Monday, October 14, 2013 1:12 PM To: Chris Riccomini <[email protected]<mailto:[email protected]>> Subject: Re: Can Samza scale up stream tasks at runtime? Thanks, Chris! On Mon, Oct 14, 2013 at 8:54 AM, Chris Riccomini <[email protected]<mailto:[email protected]>> wrote: Hey Yong, Samza does not support dynamic scaling. You'd have to bump up your yarn.container.count (assuming you're using YARN) and re-start the job. Cheers, Chris On 10/13/13 10:42 AM, "Yong Yuan" <[email protected]<mailto:[email protected]>> wrote: >Hi, folks, > >Does Samza support scaling up processing at runtime? That is, if I add new >Kafka brokers or increase the number of partitions for a topic while my >Samza job is running, will Samza allocate more stream tasks accordingly? >Or >do I have to relaunch the tasks? > >Thanks,
