Hi Yan,
If I understand correctly, a good way to increase parallelism in
ThreadJobFactory in 0.9.1 is to simply pass in N as the container count into
JobCoordinator(config, N) and then spin up a new ThreadJob for each
container. Does that sound right?
I am not sure how this is affected by 0.10.0 yet as I see a lot of changes
in that code.
Lukas
-----Original Message-----
From: Yan Fang
Sent: Monday, September 14, 2015 11:08 AM
To: dev@samza.apache.org
Subject: Re: Runtime Execution Model
Hi Bruno,
AFAIK, there is no existing JobFactory that brings as many threads as the
partition number. But I think nothing stops you to implement this: you can
get the partition information from the JobCoordinator, and then bring as
many threads as the partition/task number.
Since the two local factories (ThreadJobFactory and ProcessJobFactory) are
mainly for development, there is no additional document. But most of the
code here
<https://github.com/apache/samza/tree/master/samza-core/src/main/scala/org/apache/samza/job/local>
is
self-explained.
Thanks,
Fang, Yan
yanfang...@gmail.com
On Sat, Sep 12, 2015 at 1:47 PM, Bruno Bonacci <bruno.bona...@gmail.com>
wrote:
Hi,
I'm looking for additional documentation on the different RUNTIME
EXECUTION MODELS of the different `job.factory.class`.
I'm particularly interested on how each factory (ThreadJobFactory,
ProcessJobFactory and YarnJobFactory) will create tasks consume and
process
messages out of Kafka and the thread model used.
I did a few tests with the ThreadJob factory consuming out of a kafka
topic with 5 partitions and I was expecting that it would use multiple
threads to consume/process the different partitions, however it is
using only one thread at runtime.
Is there any way to tell Samza to use multiple processing threads (1 per
partition)??
Thanks
Bruno