Hi, Bruno,

The number of partitions consumed by a single task is also configurable via
the partition assignment policies (job.systemstreampartition.
grouper.factory). By default, there are two partition assignment policies
implemented: org.apache.samza.container.grouper.stream.GroupByPartitionFactory
and 
org.apache.samza.container.grouper.stream.GroupBySystemStreamPartitionFactory.
The detailed explanation is available here:
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html

Thanks!

-Yi

On Mon, Sep 14, 2015 at 3:37 PM, <bruno.bona...@gmail.com> wrote:

> Hi Yi,
>
> Does a single task consume from a single partition or it consumes from
> more/all partitions?
>
> Thanks
> Bruno
>
> > On 14 Sep 2015, at 23:22, Yi Pan <nickpa...@gmail.com> wrote:
> >
> > Hi, Bruno,
> >
> > The number of containers are configurable in YarnJobFactory via
> > yarn.container.count.
> > Each container is a single threaded model and you can run multiple tasks
> in
> > a single container.
> > At maximum, you can have as many containers as the number of tasks in
> this
> > config to achieve 1 task / thread.
> >
> > Hope that clarifies the config a bit more for you.
> >
> > Thanks!
> >
> > -Yi
> >
> > On Mon, Sep 14, 2015 at 3:16 PM, Bruno Bonacci <bruno.bona...@gmail.com>
> > wrote:
> >
> >> Thanks Yan for writing me back,
> >>
> >> That's ok for ThreadJobFactory and ProcessJobFactory but what about the
> >> YarnJobFactory?
> >> How many task/executors will be spawning?
> >>
> >>
> >> Bruno
> >>
> >>> On Mon, Sep 14, 2015 at 7:08 PM, Yan Fang <yanfang...@gmail.com>
> wrote:
> >>>
> >>> Hi Bruno,
> >>>
> >>> AFAIK, there is no existing JobFactory that brings as many threads as
> the
> >>> partition number. But I think nothing stops you to implement this: you
> >> can
> >>> get the partition information from the JobCoordinator, and then bring
> as
> >>> many threads as the partition/task number.
> >>>
> >>> Since the two local factories (ThreadJobFactory and ProcessJobFactory)
> >> are
> >>> mainly for development, there is no additional document. But most of
> the
> >>> code here
> >>> <
> >>
> https://github.com/apache/samza/tree/master/samza-core/src/main/scala/org/apache/samza/job/local
> >>> is
> >>> self-explained.
> >>>
> >>> Thanks,
> >>>
> >>> Fang, Yan
> >>> yanfang...@gmail.com
> >>>
> >>> On Sat, Sep 12, 2015 at 1:47 PM, Bruno Bonacci <
> bruno.bona...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>> I'm looking for additional documentation on the different RUNTIME
> >>>> EXECUTION MODELS of the different `job.factory.class`.
> >>>>
> >>>> I'm particularly interested on how each factory (ThreadJobFactory,
> >>>> ProcessJobFactory and YarnJobFactory) will create tasks consume and
> >>> process
> >>>> messages out of Kafka and the thread model used.
> >>>>
> >>>> I did a few tests with the ThreadJob factory consuming out of a kafka
> >>>> topic with 5 partitions and I was expecting that it would use multiple
> >>>> threads to consume/process the different partitions, however it is
> >>>> using only one thread at runtime.
> >>>>
> >>>> Is there any way to tell Samza to use multiple processing threads (1
> >> per
> >>>> partition)??
> >>>>
> >>>>
> >>>> Thanks
> >>>> Bruno
> >>
>

Reply via email to