Re: How to run multiple instances of the same job

Chris Riccomini Fri, 15 Aug 2014 15:19:58 -0700

Hey Telles,

If you set yarn.container.count to 5, you should get 5 containers. The two
cases where you don't are:


1. The grid is at capacity, and doesn't have the memory to fulfill all
container requests.
2. You set yarn.container.count higher than the number of partitions that
your input stream has.

Cheers,
Chris

On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> wrote:

>Hi Chris,
>
>I started playing with the yarn.container.count and set it to 5.
>
>At first I thought I had to compile the package again and republish to
>hdfs
>because I couldn't run 5 containers.
>Then I recompiled but I still only got 3 containers, is that normal
>behaviour?
>
>Thanks.
>
>
>On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega <[email protected]>
>wrote:
>
>> Thanks Chris, i will take a look at this links and I will come back if I
>> have more questions.
>>
>>
>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
>> [email protected]> wrote:
>>
>>> Hey Telles,
>>>
>>> >> Should I use many kafka brokers or one will suffice?
>>>
>>> The number of brokers you use is dependent on the number of
>>>messages/sec
>>> you're going to receive, the size of those messages, and how long
>>>you're
>>> going to retain them.
>>>
>>> Here is a good blog post on Kafka performance that should give you some
>>> idea of the numbers:
>>>
>>>
>>>
>>> 
>>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-milli
>>>on-
>>> writes-second-three-cheap-machines
>>> 
>>><https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mill
>>>ion-writes-second-three-cheap-machines>
>>>
>>> >> It could be just one job, but what is the best way to deploy many
>>> >>instances of this job so I could process a heavy load of messages?
>>>
>>> You should adjust the yarn.container.count to increase the parallelism
>>>of
>>> your job. By default, you get one container, but you can adjust this
>>>up to
>>> the total number of input partitions that you have. Have a look here
>>>for
>>> some details about how Samza's parallelism works:
>>>
>>>
>>> 
>>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction
>>>/co
>>> ncepts.html
>>> 
>>><http://samza.incubator.apache.org/learn/documentation/0.7.0/introductio
>>>n/concepts.html>
>>>
>>>
>>>
>>>
>>> Cheers,
>>> Chris
>>>
>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> wrote:
>>>
>>> >Should I use many kafka brokers or one will sufice?
>>> >
>>> >Thanks
>>> >
>>> >
>>> >On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
>>><[email protected]
>>> >
>>> >wrote:
>>> >
>>> >> It could be just one job, but what is the best way to deploy many
>>> >> instances of this job so I could process a heavy load of messages?
>>> >>
>>> >> Thanks,
>>> >>
>>> >> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote:
>>> >>
>>> >> > *"Does one kafka-broker handle this much messages per second?"*
>>> >> >
>>> >> >  I believe @Chris has better answer about this.
>>> >> >
>>> >> >
>>> >> >
>>> >> > *"I have one job that get this messages and another that reads
>>>from
>>> >>the
>>> >> > output of the first job that does some more processing."*
>>> >> >
>>> >> >    Why not use one job get messages and process them?
>>> >> >
>>> >> > *" when I change a*
>>> >> >
>>> >> > *configuration of one my jobs do I need to recompile it and send
>>>the
>>> >>new
>>> >> > tar.gz to hdfs or just change the deploy/samza config and it
>>>should
>>> >> work."*
>>> >> >
>>> >> >    No, you don't need to recompile. Change the config and
>>>run-job. It
>>> >> will
>>> >> > work.
>>> >> >
>>> >> > Thanks.
>>> >> >
>>> >> > Cheers,
>>> >> >
>>> >> > Fang, Yan
>>> >> > [email protected]
>>> >> > +1 (206) 849-4108
>>> >> >
>>> >> >
>>> >> > On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>>> >><[email protected]
>>> >> >
>>> >> > wrote:
>>> >> >
>>> >> >> Not completely related to the topic of the question but when I
>>> >>change a
>>> >> >> configuration of one my jobs do I need to recompile it and send
>>>the
>>> >>new
>>> >> >> tar.gz to hdfs or just change the deploy/samza config and it
>>>should
>>> >> work.
>>> >> >>
>>> >> >> Thanks
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>>> >> [email protected]>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> Hi, I'm running an experiment that I'm suppose to run samza with
>>> >> >> different
>>> >> >>> input rates. First I'm running with 420 messages/second and I
>>>scale
>>> >>up
>>> >> to
>>> >> >>> 33200 messages/second.
>>> >> >>>
>>> >> >>> Does one kafka-broker handle this much messages per second?
>>> >> >>> Second, what is the best way to read into samza this much
>>>messages?
>>> >>I
>>> >> >> have
>>> >> >>> one job that get this messages and another that reads from the
>>> >>output
>>> >> of
>>> >> >>> the first job that does some more processing. Is the best way to
>>> use
>>> >> more
>>> >> >>> containers and split kafka topics in partitions (the same
>>>number of
>>> >> >>> containers) or is there a better way to do this.
>>> >> >>>
>>> >> >>> Thanks in advance,
>>> >> >>>
>>> >> >>> --
>>> >> >>> ------------------------------------------
>>> >> >>> Telles Mota Vidal Nobrega
>>> >> >>> M.sc. Candidate at UFCG
>>> >> >>> B.sc. in Computer Science at UFCG
>>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> ------------------------------------------
>>> >> >> Telles Mota Vidal Nobrega
>>> >> >> M.sc. Candidate at UFCG
>>> >> >> B.sc. in Computer Science at UFCG
>>> >> >> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>> >> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >--
>>> >------------------------------------------
>>> >Telles Mota Vidal Nobrega
>>> >M.sc. Candidate at UFCG
>>> >B.sc. in Computer Science at UFCG
>>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>>>
>>>
>>
>>
>> --
>> ------------------------------------------
>> Telles Mota Vidal Nobrega
>> M.sc. Candidate at UFCG
>> B.sc. in Computer Science at UFCG
>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>>
>
>
>
>-- 
>------------------------------------------
>Telles Mota Vidal Nobrega
>M.sc. Candidate at UFCG
>B.sc. in Computer Science at UFCG
>Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to