Re: Runtime Execution Model

Lukas Steiblys Tue, 29 Sep 2015 17:35:54 -0700

Hi Yan,

If I understand correctly, a good way to increase parallelism inThreadJobFactory in 0.9.1 is to simply pass in N as the container count intoJobCoordinator(config, N) and then spin up a new ThreadJob for eachcontainer. Does that sound right?

I am not sure how this is affected by 0.10.0 yet as I see a lot of changesin that code.


Lukas

-----Original Message-----From: Yan Fang

Sent: Monday, September 14, 2015 11:08 AM
To: dev@samza.apache.org
Subject: Re: Runtime Execution Model

Hi Bruno,

AFAIK, there is no existing JobFactory that brings as many threads as the
partition number. But I think nothing stops you to implement this: you can
get the partition information from the JobCoordinator, and then bring as
many threads as the partition/task number.

Since the two local factories (ThreadJobFactory and ProcessJobFactory) are
mainly for development, there is no additional document. But most of the
code here
<https://github.com/apache/samza/tree/master/samza-core/src/main/scala/org/apache/samza/job/local>
is
self-explained.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Sat, Sep 12, 2015 at 1:47 PM, Bruno Bonacci <bruno.bona...@gmail.com>
wrote:

Hi,
I'm looking for additional documentation on the different RUNTIME
EXECUTION MODELS of the different `job.factory.class`.

I'm particularly interested on how each factory (ThreadJobFactory,

ProcessJobFactory and YarnJobFactory) will create tasks consume andprocess

messages out of Kafka and the thread model used.

I did a few tests with the ThreadJob factory consuming out of a kafka
topic with 5 partitions and I was expecting that it would use multiple
threads to consume/process the different partitions, however it is
using only one thread at runtime.

Is there any way to tell Samza to use multiple processing threads (1 per
partition)??


Thanks
Bruno

Re: Runtime Execution Model

Reply via email to