Kartik,

This is for the case when you don't use YARN. ThreadJob runs locally
and simply spins up a single thread for all tasks right now.

Lukas

On 10/20/15, Kartik Paramasivam <kparamasi...@linkedin.com.invalid> wrote:
> We have been wanting to do something similar at LinkedIn.  We however
> haven't thought through the details.
>
> if container == thread.. then we would need to change the AppMaster to
> request the appropriate number of Yarn 'containers' (processes) .. i.e. we
> would have to decouple the process count from the yarn.Containers.Count ..
>
> Basically wouldn't we have to come up with a new setting Yarn.ProcessCount
> ?
>
> On Mon, Oct 19, 2015 at 3:49 PM, Lukas Steiblys <lu...@doubledutch.me>
> wrote:
>
>> I have been thinking lately about the most non-invasive way to add
>> multithreading capabilities to ThreadJobFactory, as that is the main
>> method
>> we run our jobs in production. Looking at the master branch code in Git,
>> I
>> have found the following:
>>   a.. The best way would be to simply spin up a new thread for each
>> container.
>>   b.. The number of containers can already be specified using the
>> configuration property job.container.count.
>>   c.. I can construct a new SamzaContainer for each containerModel
>> returned from coordinator.jobModel.getContainers in ThreadJobFactory.
>>   d.. I can pass a list of these containers into ThreadJob constructor
>> modifying it to accept an array of Runnables.
>>   e.. For each runnable, it would create a new thread and start it in the
>> submit method of ThreadJob.
>> This should start up a new thread for each container and group the tasks
>> using the appropriate TaskNameGrouper.
>>
>> Any ideas on what I might have missed? Are there any other potential
>> solutions? Would this be a good patch for Samza in general?
>>
>> Lukas
>>
>

Reply via email to