Re: How to run multiple instances of the same job

Chris Riccomini Mon, 18 Aug 2014 09:45:31 -0700

Hey Telles,

The problem could occur with HDFS. I believe that LOCALIZING just means
that the NM is trying to download the artifact from wherever it is (be
that HTTP, HDFS, etc).


Cheers,
Chris

On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote:

>Chris,
>
>I'm using HDFS, I will run again and see if the problem happens and I will
>post if i find any problem or have more questions.
>
>Thanks.
>
>
>On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini <
>[email protected]> wrote:
>
>> Hey Telles,
>>
>> Usually, when a job is stuck in LOCALIZING, it means that YARN is
>> struggling to distribute your binary (the .tgz) to the appropriate
>> NodeManagers, I think. You should check your NM logs and see if there
>>are
>> any hints about what's going on there.
>>
>> I've seen this in the past when the NM hangs trying to download a .tgz
>> from the HTTP server for some reason.
>>
>> Cheers,
>> Chris
>>
>> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> wrote:
>>
>> >I was able to fix this problem, now I¹m having another one. I¹m using a
>> >script that starts kafka, deploys samza jobs, stop them, kills kafka
>>and
>> >delete configurations in zookeeper and kafka-log files. Them start over
>> >again. I see that sometimes jobs don¹t start running, they stay in
>> >accepted state with info LOCALIZING, what can be the cause for that?
>> >
>> >Thanks.
>> >On 15 Aug 2014, at 19:18, Chris Riccomini
>> ><[email protected]> wrote:
>> >
>> >> Hey Telles,
>> >>
>> >> If you set yarn.container.count to 5, you should get 5 containers.
>>The
>> >>two
>> >> cases where you don't are:
>> >>
>> >> 1. The grid is at capacity, and doesn't have the memory to fulfill
>>all
>> >> container requests.
>> >> 2. You set yarn.container.count higher than the number of partitions
>> >>that
>> >> your input stream has.
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> wrote:
>> >>
>> >>> Hi Chris,
>> >>>
>> >>> I started playing with the yarn.container.count and set it to 5.
>> >>>
>> >>> At first I thought I had to compile the package again and republish
>>to
>> >>> hdfs
>> >>> because I couldn't run 5 containers.
>> >>> Then I recompiled but I still only got 3 containers, is that normal
>> >>> behaviour?
>> >>>
>> >>> Thanks.
>> >>>
>> >>>
>> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega
>> >>><[email protected]>
>> >>> wrote:
>> >>>
>> >>>> Thanks Chris, i will take a look at this links and I will come back
>> >>>>if I
>> >>>> have more questions.
>> >>>>
>> >>>>
>> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
>> >>>> [email protected]> wrote:
>> >>>>
>> >>>>> Hey Telles,
>> >>>>>
>> >>>>>>> Should I use many kafka brokers or one will suffice?
>> >>>>>
>> >>>>> The number of brokers you use is dependent on the number of
>> >>>>> messages/sec
>> >>>>> you're going to receive, the size of those messages, and how long
>> >>>>> you're
>> >>>>> going to retain them.
>> >>>>>
>> >>>>> Here is a good blog post on Kafka performance that should give you
>> >>>>>some
>> >>>>> idea of the numbers:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil
>> >>>>>li
>> >>>>> on-
>> >>>>> writes-second-three-cheap-machines
>> >>>>>
>> >>>>>
>> >>>>><
>> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi
>> >>>>>ll
>> >>>>> ion-writes-second-three-cheap-machines>
>> >>>>>
>> >>>>>>> It could be just one job, but what is the best way to deploy
>>many
>> >>>>>>> instances of this job so I could process a heavy load of
>>messages?
>> >>>>>
>> >>>>> You should adjust the yarn.container.count to increase the
>> >>>>>parallelism
>> >>>>> of
>> >>>>> your job. By default, you get one container, but you can adjust
>>this
>> >>>>> up to
>> >>>>> the total number of input partitions that you have. Have a look
>>here
>> >>>>> for
>> >>>>> some details about how Samza's parallelism works:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti
>> >>>>>on
>> >>>>> /co
>> >>>>> ncepts.html
>> >>>>>
>> >>>>>
>> >>>>><
>> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct
>> >>>>>io
>> >>>>> n/concepts.html>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Chris
>> >>>>>
>> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]>
>> wrote:
>> >>>>>
>> >>>>>> Should I use many kafka brokers or one will sufice?
>> >>>>>>
>> >>>>>> Thanks
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
>> >>>>> <[email protected]
>> >>>>>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> It could be just one job, but what is the best way to deploy
>>many
>> >>>>>>> instances of this job so I could process a heavy load of
>>messages?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>>
>> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote:
>> >>>>>>>
>> >>>>>>>> *"Does one kafka-broker handle this much messages per second?"*
>> >>>>>>>>
>> >>>>>>>> I believe @Chris has better answer about this.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> *"I have one job that get this messages and another that reads
>> >>>>> from
>> >>>>>>> the
>> >>>>>>>> output of the first job that does some more processing."*
>> >>>>>>>>
>> >>>>>>>>   Why not use one job get messages and process them?
>> >>>>>>>>
>> >>>>>>>> *" when I change a*
>> >>>>>>>>
>> >>>>>>>> *configuration of one my jobs do I need to recompile it and
>>send
>> >>>>> the
>> >>>>>>> new
>> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
>> >>>>> should
>> >>>>>>> work."*
>> >>>>>>>>
>> >>>>>>>>   No, you don't need to recompile. Change the config and
>> >>>>> run-job. It
>> >>>>>>> will
>> >>>>>>>> work.
>> >>>>>>>>
>> >>>>>>>> Thanks.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>>
>> >>>>>>>> Fang, Yan
>> >>>>>>>> [email protected]
>> >>>>>>>> +1 (206) 849-4108
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>> >>>>>>> <[email protected]
>> >>>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Not completely related to the topic of the question but when I
>> >>>>>>> change a
>> >>>>>>>>> configuration of one my jobs do I need to recompile it and
>>send
>> >>>>> the
>> >>>>>>> new
>> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
>> >>>>> should
>> >>>>>>> work.
>> >>>>>>>>>
>> >>>>>>>>> Thanks
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>> >>>>>>> [email protected]>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza
>>with
>> >>>>>>>>> different
>> >>>>>>>>>> input rates. First I'm running with 420 messages/second and I
>> >>>>> scale
>> >>>>>>> up
>> >>>>>>> to
>> >>>>>>>>>> 33200 messages/second.
>> >>>>>>>>>>
>> >>>>>>>>>> Does one kafka-broker handle this much messages per second?
>> >>>>>>>>>> Second, what is the best way to read into samza this much
>> >>>>> messages?
>> >>>>>>> I
>> >>>>>>>>> have
>> >>>>>>>>>> one job that get this messages and another that reads from
>>the
>> >>>>>>> output
>> >>>>>>> of
>> >>>>>>>>>> the first job that does some more processing. Is the best
>>way to
>> >>>>> use
>> >>>>>>> more
>> >>>>>>>>>> containers and split kafka topics in partitions (the same
>> >>>>> number of
>> >>>>>>>>>> containers) or is there a better way to do this.
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks in advance,
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> ------------------------------------------
>> >>>>>>>>>> Telles Mota Vidal Nobrega
>> >>>>>>>>>> M.sc. Candidate at UFCG
>> >>>>>>>>>> B.sc. in Computer Science at UFCG
>> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> ------------------------------------------
>> >>>>>>>>> Telles Mota Vidal Nobrega
>> >>>>>>>>> M.sc. Candidate at UFCG
>> >>>>>>>>> B.sc. in Computer Science at UFCG
>> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> ------------------------------------------
>> >>>>>> Telles Mota Vidal Nobrega
>> >>>>>> M.sc. Candidate at UFCG
>> >>>>>> B.sc. in Computer Science at UFCG
>> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> ------------------------------------------
>> >>>> Telles Mota Vidal Nobrega
>> >>>> M.sc. Candidate at UFCG
>> >>>> B.sc. in Computer Science at UFCG
>> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> ------------------------------------------
>> >>> Telles Mota Vidal Nobrega
>> >>> M.sc. Candidate at UFCG
>> >>> B.sc. in Computer Science at UFCG
>> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >
>>
>>
>
>
>-- 
>------------------------------------------
>Telles Mota Vidal Nobrega
>M.sc. Candidate at UFCG
>B.sc. in Computer Science at UFCG
>Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to