Re: How to run multiple instances of the same job

Telles Nobrega Fri, 15 Aug 2014 13:57:51 -0700

Hi Chris,

I started playing with the yarn.container.count and set it to 5.


At first I thought I had to compile the package again and republish to hdfs
because I couldn't run 5 containers.
Then I recompiled but I still only got 3 containers, is that normal
behaviour?

Thanks.


On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega <[email protected]>
wrote:

> Thanks Chris, i will take a look at this links and I will come back if I
> have more questions.
>
>
> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
> [email protected]> wrote:
>
>> Hey Telles,
>>
>> >> Should I use many kafka brokers or one will suffice?
>>
>> The number of brokers you use is dependent on the number of messages/sec
>> you're going to receive, the size of those messages, and how long you're
>> going to retain them.
>>
>> Here is a good blog post on Kafka performance that should give you some
>> idea of the numbers:
>>
>>
>>
>> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-
>> writes-second-three-cheap-machines
>> <https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines>
>>
>> >> It could be just one job, but what is the best way to deploy many
>> >>instances of this job so I could process a heavy load of messages?
>>
>> You should adjust the yarn.container.count to increase the parallelism of
>> your job. By default, you get one container, but you can adjust this up to
>> the total number of input partitions that you have. Have a look here for
>> some details about how Samza's parallelism works:
>>
>>
>> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/co
>> ncepts.html
>> <http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/concepts.html>
>>
>>
>>
>>
>> Cheers,
>> Chris
>>
>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> wrote:
>>
>> >Should I use many kafka brokers or one will sufice?
>> >
>> >Thanks
>> >
>> >
>> >On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega <[email protected]
>> >
>> >wrote:
>> >
>> >> It could be just one job, but what is the best way to deploy many
>> >> instances of this job so I could process a heavy load of messages?
>> >>
>> >> Thanks,
>> >>
>> >> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote:
>> >>
>> >> > *"Does one kafka-broker handle this much messages per second?"*
>> >> >
>> >> >  I believe @Chris has better answer about this.
>> >> >
>> >> >
>> >> >
>> >> > *"I have one job that get this messages and another that reads from
>> >>the
>> >> > output of the first job that does some more processing."*
>> >> >
>> >> >    Why not use one job get messages and process them?
>> >> >
>> >> > *" when I change a*
>> >> >
>> >> > *configuration of one my jobs do I need to recompile it and send the
>> >>new
>> >> > tar.gz to hdfs or just change the deploy/samza config and it should
>> >> work."*
>> >> >
>> >> >    No, you don't need to recompile. Change the config and run-job. It
>> >> will
>> >> > work.
>> >> >
>> >> > Thanks.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Fang, Yan
>> >> > [email protected]
>> >> > +1 (206) 849-4108
>> >> >
>> >> >
>> >> > On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>> >><[email protected]
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Not completely related to the topic of the question but when I
>> >>change a
>> >> >> configuration of one my jobs do I need to recompile it and send the
>> >>new
>> >> >> tar.gz to hdfs or just change the deploy/samza config and it should
>> >> work.
>> >> >>
>> >> >> Thanks
>> >> >>
>> >> >>
>> >> >> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>> >> [email protected]>
>> >> >> wrote:
>> >> >>
>> >> >>> Hi, I'm running an experiment that I'm suppose to run samza with
>> >> >> different
>> >> >>> input rates. First I'm running with 420 messages/second and I scale
>> >>up
>> >> to
>> >> >>> 33200 messages/second.
>> >> >>>
>> >> >>> Does one kafka-broker handle this much messages per second?
>> >> >>> Second, what is the best way to read into samza this much messages?
>> >>I
>> >> >> have
>> >> >>> one job that get this messages and another that reads from the
>> >>output
>> >> of
>> >> >>> the first job that does some more processing. Is the best way to
>> use
>> >> more
>> >> >>> containers and split kafka topics in partitions (the same number of
>> >> >>> containers) or is there a better way to do this.
>> >> >>>
>> >> >>> Thanks in advance,
>> >> >>>
>> >> >>> --
>> >> >>> ------------------------------------------
>> >> >>> Telles Mota Vidal Nobrega
>> >> >>> M.sc. Candidate at UFCG
>> >> >>> B.sc. in Computer Science at UFCG
>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> ------------------------------------------
>> >> >> Telles Mota Vidal Nobrega
>> >> >> M.sc. Candidate at UFCG
>> >> >> B.sc. in Computer Science at UFCG
>> >> >> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >> >>
>> >>
>> >>
>> >
>> >
>> >--
>> >------------------------------------------
>> >Telles Mota Vidal Nobrega
>> >M.sc. Candidate at UFCG
>> >B.sc. in Computer Science at UFCG
>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>>
>>
>
>
> --
> ------------------------------------------
> Telles Mota Vidal Nobrega
> M.sc. Candidate at UFCG
> B.sc. in Computer Science at UFCG
> Software Engineer at OpenStack Project - HP/LSD-UFCG
>



-- 
------------------------------------------
Telles Mota Vidal Nobrega
M.sc. Candidate at UFCG
B.sc. in Computer Science at UFCG
Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to