Hey Yong,

No problem. It is conceivable to add the feature you're talking about (or even 
full-blown auto-scaling), but we haven't felt a pressing need for it right now.

The reason that we don't have this feature is that Samza's partitions are 
statically assigned to a container. For example, if you run two containers, and 
four partitions, container 0 gets partitions 0 and 2, and container 1 gets 
partitions 1 and 3. If you were to then add a third container, the partition 
assignments change, since the new container needs to take control of some 
partitions.

Right now, Samza doesn't support dynamically relocating partitions in a 
container. This was a conscious decision to reduce operational complexity (to 
avoid requiring ZooKeeper in containers).

Without dynamic partition reassignment, the Samza ApplicationMaster (AM) has to 
kill a container, then start it again with new partitions. Thus, to add a new 
container to a running job, the AM would first kill all containers, re-assign 
partitions so that the new container gets some partitions, and then start all 
the containers up again. This is nearly the same operation as just restarting 
the job. We opted to force a job restart because of this.

Furthermore, by restarting the Samza job, you get the added benefit that a job 
restart requires a config change to update the yarn.container.count config 
value, which means you get all the nice things that come with updating 
configuration (seeing who made the change, seeing when the change was made, 
seeing the current value, etc). This stuff is really useful in a production 
environment when multiple parties are involved.

Cheers,
Chris

From: Yong Yuan <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, October 14, 2013 1:12 PM
To: Chris Riccomini <[email protected]<mailto:[email protected]>>
Subject: Re: Can Samza scale up stream tasks at runtime?

Thanks, Chris!


On Mon, Oct 14, 2013 at 8:54 AM, Chris Riccomini 
<[email protected]<mailto:[email protected]>> wrote:
Hey Yong,

Samza does not support dynamic scaling. You'd have to bump up your
yarn.container.count (assuming you're using YARN) and re-start the job.

Cheers,
Chris

On 10/13/13 10:42 AM, "Yong Yuan" <[email protected]<mailto:[email protected]>> 
wrote:

>Hi, folks,
>
>Does Samza support scaling up processing at runtime? That is, if I add new
>Kafka brokers or increase the number of partitions for a topic while my
>Samza job is running, will Samza allocate more stream tasks accordingly?
>Or
>do I have to relaunch the tasks?
>
>Thanks,


Reply via email to