Hi Dotan,

A Samza job will create one instance of your StreamTask class for each input 
partition. There is no particular limit to the number of such partitions you 
can have; the main limitation is that each partition requires a file handle on 
the Kafka brokers, so if you want to go over a few hundred, you'll need to be 
careful.

The number of containers is independent from the number of input partitions. 
You can set it to use one container, in which case all StreamTasks will be in 
the same JVM, and multiplexed onto a single thread. If you set it to use two 
containers, approximately half the StreamTasks will be in one JVM, 
approximately half in the other. Etc.

If what you are talking about is several tasks in sequence within the same 
container (i.e. one task consumes another one's output): that isn't supported 
by Samza right now. Every task output has to be written to a stream. You can 
build your own mechanism for composing bits of logic within the same container, 
but Samza provides a deliberately low-level interface which doesn't include 
such a mechanism.

Hope that helps,
Martin

On 25 Nov 2014, at 06:58, Dotan Patrich <[email protected]> wrote:

> Hi,
> 
> We run a topology that contains multiple tasks and plan to add more to it
> in the near future. However, one of the key design issues that I
> considering is how granular should each samza task should be: on the one
> hand have granular tasks helps integrating them at different parts of the
> topology, however on the other hand each task has it's own basic JVM memory
> requirement that restrict how many tasks a machine can host.
> 
> One thing I noticed in the documentation is that each samza container can
> host several tasks?
> "The SamzaContainer is responsible for managing the startup, execution, and
> shutdown of one or more StreamTask
> <http://samza.incubator.apache.org/learn/documentation/0.7.0/api/overview.html>
> instances"
> 
> I thought this could be some sort of workaround to the memory concerns I
> have (assuming cpu consumption of the streaming task will work out ok).
> Can anyone share how to host several tasks in a single container? Are those
> only tasks instances for different partitions or can it be different tasks
> all together?
> 
> Thanks,
> Dotan

Reply via email to