Re: How to run multiple instances of the same job

Telles Nobrega Wed, 20 Aug 2014 13:23:14 -0700

Chris, is there a way to eliminate completely buffering in samza + kafka?


On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega <[email protected]>
wrote:

> I see. Thanks. Weird thing is it works some rounds and than stops.
>
>
> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini <
> [email protected]> wrote:
>
>> Hey Telles,
>>
>> The problem could occur with HDFS. I believe that LOCALIZING just means
>> that the NM is trying to download the artifact from wherever it is (be
>> that HTTP, HDFS, etc).
>>
>> Cheers,
>> Chris
>>
>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote:
>>
>> >Chris,
>> >
>> >I'm using HDFS, I will run again and see if the problem happens and I
>> will
>> >post if i find any problem or have more questions.
>> >
>> >Thanks.
>> >
>> >
>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini <
>> >[email protected]> wrote:
>> >
>> >> Hey Telles,
>> >>
>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is
>> >> struggling to distribute your binary (the .tgz) to the appropriate
>> >> NodeManagers, I think. You should check your NM logs and see if there
>> >>are
>> >> any hints about what's going on there.
>> >>
>> >> I've seen this in the past when the NM hangs trying to download a .tgz
>> >> from the HTTP server for some reason.
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> wrote:
>> >>
>> >> >I was able to fix this problem, now I¹m having another one. I¹m using
>> a
>> >> >script that starts kafka, deploys samza jobs, stop them, kills kafka
>> >>and
>> >> >delete configurations in zookeeper and kafka-log files. Them start
>> over
>> >> >again. I see that sometimes jobs don¹t start running, they stay in
>> >> >accepted state with info LOCALIZING, what can be the cause for that?
>> >> >
>> >> >Thanks.
>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini
>> >> ><[email protected]> wrote:
>> >> >
>> >> >> Hey Telles,
>> >> >>
>> >> >> If you set yarn.container.count to 5, you should get 5 containers.
>> >>The
>> >> >>two
>> >> >> cases where you don't are:
>> >> >>
>> >> >> 1. The grid is at capacity, and doesn't have the memory to fulfill
>> >>all
>> >> >> container requests.
>> >> >> 2. You set yarn.container.count higher than the number of partitions
>> >> >>that
>> >> >> your input stream has.
>> >> >>
>> >> >> Cheers,
>> >> >> Chris
>> >> >>
>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]>
>> wrote:
>> >> >>
>> >> >>> Hi Chris,
>> >> >>>
>> >> >>> I started playing with the yarn.container.count and set it to 5.
>> >> >>>
>> >> >>> At first I thought I had to compile the package again and republish
>> >>to
>> >> >>> hdfs
>> >> >>> because I couldn't run 5 containers.
>> >> >>> Then I recompiled but I still only got 3 containers, is that normal
>> >> >>> behaviour?
>> >> >>>
>> >> >>> Thanks.
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega
>> >> >>><[email protected]>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Thanks Chris, i will take a look at this links and I will come
>> back
>> >> >>>>if I
>> >> >>>> have more questions.
>> >> >>>>
>> >> >>>>
>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
>> >> >>>> [email protected]> wrote:
>> >> >>>>
>> >> >>>>> Hey Telles,
>> >> >>>>>
>> >> >>>>>>> Should I use many kafka brokers or one will suffice?
>> >> >>>>>
>> >> >>>>> The number of brokers you use is dependent on the number of
>> >> >>>>> messages/sec
>> >> >>>>> you're going to receive, the size of those messages, and how long
>> >> >>>>> you're
>> >> >>>>> going to retain them.
>> >> >>>>>
>> >> >>>>> Here is a good blog post on Kafka performance that should give
>> you
>> >> >>>>>some
>> >> >>>>> idea of the numbers:
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil
>> >> >>>>>li
>> >> >>>>> on-
>> >> >>>>> writes-second-three-cheap-machines
>> >> >>>>>
>> >> >>>>>
>> >> >>>>><
>> >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi
>> >> >>>>>ll
>> >> >>>>> ion-writes-second-three-cheap-machines>
>> >> >>>>>
>> >> >>>>>>> It could be just one job, but what is the best way to deploy
>> >>many
>> >> >>>>>>> instances of this job so I could process a heavy load of
>> >>messages?
>> >> >>>>>
>> >> >>>>> You should adjust the yarn.container.count to increase the
>> >> >>>>>parallelism
>> >> >>>>> of
>> >> >>>>> your job. By default, you get one container, but you can adjust
>> >>this
>> >> >>>>> up to
>> >> >>>>> the total number of input partitions that you have. Have a look
>> >>here
>> >> >>>>> for
>> >> >>>>> some details about how Samza's parallelism works:
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti
>> >> >>>>>on
>> >> >>>>> /co
>> >> >>>>> ncepts.html
>> >> >>>>>
>> >> >>>>>
>> >> >>>>><
>> >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct
>> >> >>>>>io
>> >> >>>>> n/concepts.html>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Cheers,
>> >> >>>>> Chris
>> >> >>>>>
>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]>
>> >> wrote:
>> >> >>>>>
>> >> >>>>>> Should I use many kafka brokers or one will sufice?
>> >> >>>>>>
>> >> >>>>>> Thanks
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
>> >> >>>>> <[email protected]
>> >> >>>>>>
>> >> >>>>>> wrote:
>> >> >>>>>>
>> >> >>>>>>> It could be just one job, but what is the best way to deploy
>> >>many
>> >> >>>>>>> instances of this job so I could process a heavy load of
>> >>messages?
>> >> >>>>>>>
>> >> >>>>>>> Thanks,
>> >> >>>>>>>
>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]>
>> wrote:
>> >> >>>>>>>
>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per
>> second?"*
>> >> >>>>>>>>
>> >> >>>>>>>> I believe @Chris has better answer about this.
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> *"I have one job that get this messages and another that reads
>> >> >>>>> from
>> >> >>>>>>> the
>> >> >>>>>>>> output of the first job that does some more processing."*
>> >> >>>>>>>>
>> >> >>>>>>>>   Why not use one job get messages and process them?
>> >> >>>>>>>>
>> >> >>>>>>>> *" when I change a*
>> >> >>>>>>>>
>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it and
>> >>send
>> >> >>>>> the
>> >> >>>>>>> new
>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
>> >> >>>>> should
>> >> >>>>>>> work."*
>> >> >>>>>>>>
>> >> >>>>>>>>   No, you don't need to recompile. Change the config and
>> >> >>>>> run-job. It
>> >> >>>>>>> will
>> >> >>>>>>>> work.
>> >> >>>>>>>>
>> >> >>>>>>>> Thanks.
>> >> >>>>>>>>
>> >> >>>>>>>> Cheers,
>> >> >>>>>>>>
>> >> >>>>>>>> Fang, Yan
>> >> >>>>>>>> [email protected]
>> >> >>>>>>>> +1 (206) 849-4108
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>> >> >>>>>>> <[email protected]
>> >> >>>>>>>>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>>> Not completely related to the topic of the question but when
>> I
>> >> >>>>>>> change a
>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it and
>> >>send
>> >> >>>>> the
>> >> >>>>>>> new
>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it
>> >> >>>>> should
>> >> >>>>>>> work.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Thanks
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>> >> >>>>>>> [email protected]>
>> >> >>>>>>>>> wrote:
>> >> >>>>>>>>>
>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza
>> >>with
>> >> >>>>>>>>> different
>> >> >>>>>>>>>> input rates. First I'm running with 420 messages/second and
>> I
>> >> >>>>> scale
>> >> >>>>>>> up
>> >> >>>>>>> to
>> >> >>>>>>>>>> 33200 messages/second.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per second?
>> >> >>>>>>>>>> Second, what is the best way to read into samza this much
>> >> >>>>> messages?
>> >> >>>>>>> I
>> >> >>>>>>>>> have
>> >> >>>>>>>>>> one job that get this messages and another that reads from
>> >>the
>> >> >>>>>>> output
>> >> >>>>>>> of
>> >> >>>>>>>>>> the first job that does some more processing. Is the best
>> >>way to
>> >> >>>>> use
>> >> >>>>>>> more
>> >> >>>>>>>>>> containers and split kafka topics in partitions (the same
>> >> >>>>> number of
>> >> >>>>>>>>>> containers) or is there a better way to do this.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Thanks in advance,
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> --
>> >> >>>>>>>>>> ------------------------------------------
>> >> >>>>>>>>>> Telles Mota Vidal Nobrega
>> >> >>>>>>>>>> M.sc. Candidate at UFCG
>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG
>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >> >>>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> --
>> >> >>>>>>>>> ------------------------------------------
>> >> >>>>>>>>> Telles Mota Vidal Nobrega
>> >> >>>>>>>>> M.sc. Candidate at UFCG
>> >> >>>>>>>>> B.sc. in Computer Science at UFCG
>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >> >>>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> ------------------------------------------
>> >> >>>>>> Telles Mota Vidal Nobrega
>> >> >>>>>> M.sc. Candidate at UFCG
>> >> >>>>>> B.sc. in Computer Science at UFCG
>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >> >>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> ------------------------------------------
>> >> >>>> Telles Mota Vidal Nobrega
>> >> >>>> M.sc. Candidate at UFCG
>> >> >>>> B.sc. in Computer Science at UFCG
>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> ------------------------------------------
>> >> >>> Telles Mota Vidal Nobrega
>> >> >>> M.sc. Candidate at UFCG
>> >> >>> B.sc. in Computer Science at UFCG
>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >> >
>> >>
>> >>
>> >
>> >
>> >--
>> >------------------------------------------
>> >Telles Mota Vidal Nobrega
>> >M.sc. Candidate at UFCG
>> >B.sc. in Computer Science at UFCG
>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>>
>>
>
>
> --
> ------------------------------------------
> Telles Mota Vidal Nobrega
> M.sc. Candidate at UFCG
> B.sc. in Computer Science at UFCG
> Software Engineer at OpenStack Project - HP/LSD-UFCG
>



-- 
------------------------------------------
Telles Mota Vidal Nobrega
M.sc. Candidate at UFCG
B.sc. in Computer Science at UFCG
Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to