Re: How to run multiple instances of the same job

Chris Riccomini Fri, 22 Aug 2014 08:03:37 -0700

Hey Telles,

>> SO increase this number I'm using many producers, but seems like kafka
>>is not accepting them all.


When you say Kafka is not accepting them, what do you mean? Kafka
generally doesn't reject messages unless the size of the message that
you're sending is too large (message.max.bytes in
http://kafka.apache.org/documentation.html#brokerconfigs).

Cheers,
Chris

On 8/21/14 4:05 PM, "Telles Nobrega" <[email protected]> wrote:

>Thanks. So I need to send lots of messages to kafka, I'm using a producer
>that connects to kafka to send it. SO increase this number I'm using many
>producers, but seems like kafka is not accepting them all. Is there a way
>to work around this? I need some like 30000 messages per second.
>
>Thanks
>
>
>On Wed, Aug 20, 2014 at 6:47 PM, Chris Riccomini <
>[email protected]> wrote:
>
>> Hey Telles,
>>
>> The Samza job can be configured to disable batching and use sync sends:
>>
>> systems.kafka.producer.producer.type=sync
>> systems.kafka.producer.batch.num.messages=1
>>
>> This is how the hello-samza job works. :)
>>
>>
>> Note that it will dramatically affect your throughput, but if you're
>>doing
>> this, you probably have a low throughput topic anyway.
>>
>> Cheers,
>> Chris
>>
>> On 8/20/14 1:21 PM, "Telles Nobrega" <[email protected]> wrote:
>>
>> >Chris, is there a way to eliminate completely buffering in samza +
>>kafka?
>> >
>> >
>> >On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega
>><[email protected]>
>> >wrote:
>> >
>> >> I see. Thanks. Weird thing is it works some rounds and than stops.
>> >>
>> >>
>> >> On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini <
>> >> [email protected]> wrote:
>> >>
>> >>> Hey Telles,
>> >>>
>> >>> The problem could occur with HDFS. I believe that LOCALIZING just
>>means
>> >>> that the NM is trying to download the artifact from wherever it is
>>(be
>> >>> that HTTP, HDFS, etc).
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>>
>> >>> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]>
>>wrote:
>> >>>
>> >>> >Chris,
>> >>> >
>> >>> >I'm using HDFS, I will run again and see if the problem happens
>>and I
>> >>> will
>> >>> >post if i find any problem or have more questions.
>> >>> >
>> >>> >Thanks.
>> >>> >
>> >>> >
>> >>> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini <
>> >>> >[email protected]> wrote:
>> >>> >
>> >>> >> Hey Telles,
>> >>> >>
>> >>> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is
>> >>> >> struggling to distribute your binary (the .tgz) to the
>>appropriate
>> >>> >> NodeManagers, I think. You should check your NM logs and see if
>> >>>there
>> >>> >>are
>> >>> >> any hints about what's going on there.
>> >>> >>
>> >>> >> I've seen this in the past when the NM hangs trying to download a
>> >>>.tgz
>> >>> >> from the HTTP server for some reason.
>> >>> >>
>> >>> >> Cheers,
>> >>> >> Chris
>> >>> >>
>> >>> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]>
>> >>>wrote:
>> >>> >>
>> >>> >> >I was able to fix this problem, now I¹m having another one. I¹m
>> >>>using
>> >>> a
>> >>> >> >script that starts kafka, deploys samza jobs, stop them, kills
>> >>>kafka
>> >>> >>and
>> >>> >> >delete configurations in zookeeper and kafka-log files. Them
>>start
>> >>> over
>> >>> >> >again. I see that sometimes jobs don¹t start running, they stay
>>in
>> >>> >> >accepted state with info LOCALIZING, what can be the cause for
>> >>>that?
>> >>> >> >
>> >>> >> >Thanks.
>> >>> >> >On 15 Aug 2014, at 19:18, Chris Riccomini
>> >>> >> ><[email protected]> wrote:
>> >>> >> >
>> >>> >> >> Hey Telles,
>> >>> >> >>
>> >>> >> >> If you set yarn.container.count to 5, you should get 5
>> >>>containers.
>> >>> >>The
>> >>> >> >>two
>> >>> >> >> cases where you don't are:
>> >>> >> >>
>> >>> >> >> 1. The grid is at capacity, and doesn't have the memory to
>> >>>fulfill
>> >>> >>all
>> >>> >> >> container requests.
>> >>> >> >> 2. You set yarn.container.count higher than the number of
>> >>>partitions
>> >>> >> >>that
>> >>> >> >> your input stream has.
>> >>> >> >>
>> >>> >> >> Cheers,
>> >>> >> >> Chris
>> >>> >> >>
>> >>> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]>
>> >>> wrote:
>> >>> >> >>
>> >>> >> >>> Hi Chris,
>> >>> >> >>>
>> >>> >> >>> I started playing with the yarn.container.count and set it
>>to 5.
>> >>> >> >>>
>> >>> >> >>> At first I thought I had to compile the package again and
>> >>>republish
>> >>> >>to
>> >>> >> >>> hdfs
>> >>> >> >>> because I couldn't run 5 containers.
>> >>> >> >>> Then I recompiled but I still only got 3 containers, is that
>> >>>normal
>> >>> >> >>> behaviour?
>> >>> >> >>>
>> >>> >> >>> Thanks.
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega
>> >>> >> >>><[email protected]>
>> >>> >> >>> wrote:
>> >>> >> >>>
>> >>> >> >>>> Thanks Chris, i will take a look at this links and I will
>>come
>> >>> back
>> >>> >> >>>>if I
>> >>> >> >>>> have more questions.
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini <
>> >>> >> >>>> [email protected]> wrote:
>> >>> >> >>>>
>> >>> >> >>>>> Hey Telles,
>> >>> >> >>>>>
>> >>> >> >>>>>>> Should I use many kafka brokers or one will suffice?
>> >>> >> >>>>>
>> >>> >> >>>>> The number of brokers you use is dependent on the number of
>> >>> >> >>>>> messages/sec
>> >>> >> >>>>> you're going to receive, the size of those messages, and
>>how
>> >>>long
>> >>> >> >>>>> you're
>> >>> >> >>>>> going to retain them.
>> >>> >> >>>>>
>> >>> >> >>>>> Here is a good blog post on Kafka performance that should
>>give
>> >>> you
>> >>> >> >>>>>some
>> >>> >> >>>>> idea of the numbers:
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >>
>> 
>>>>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil
>> >>> >> >>>>>li
>> >>> >> >>>>> on-
>> >>> >> >>>>> writes-second-three-cheap-machines
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>><
>> >>> >>
>> >>>https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi
>> >>> >> >>>>>ll
>> >>> >> >>>>> ion-writes-second-three-cheap-machines>
>> >>> >> >>>>>
>> >>> >> >>>>>>> It could be just one job, but what is the best way to
>>deploy
>> >>> >>many
>> >>> >> >>>>>>> instances of this job so I could process a heavy load of
>> >>> >>messages?
>> >>> >> >>>>>
>> >>> >> >>>>> You should adjust the yarn.container.count to increase the
>> >>> >> >>>>>parallelism
>> >>> >> >>>>> of
>> >>> >> >>>>> your job. By default, you get one container, but you can
>> >>>adjust
>> >>> >>this
>> >>> >> >>>>> up to
>> >>> >> >>>>> the total number of input partitions that you have. Have a
>> >>>look
>> >>> >>here
>> >>> >> >>>>> for
>> >>> >> >>>>> some details about how Samza's parallelism works:
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >>
>> 
>>>>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti
>> >>> >> >>>>>on
>> >>> >> >>>>> /co
>> >>> >> >>>>> ncepts.html
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>><
>> >>> >>
>> >>>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct
>> >>> >> >>>>>io
>> >>> >> >>>>> n/concepts.html>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>> Cheers,
>> >>> >> >>>>> Chris
>> >>> >> >>>>>
>> >>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega"
>><[email protected]
>> >
>> >>> >> wrote:
>> >>> >> >>>>>
>> >>> >> >>>>>> Should I use many kafka brokers or one will sufice?
>> >>> >> >>>>>>
>> >>> >> >>>>>> Thanks
>> >>> >> >>>>>>
>> >>> >> >>>>>>
>> >>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega
>> >>> >> >>>>> <[email protected]
>> >>> >> >>>>>>
>> >>> >> >>>>>> wrote:
>> >>> >> >>>>>>
>> >>> >> >>>>>>> It could be just one job, but what is the best way to
>>deploy
>> >>> >>many
>> >>> >> >>>>>>> instances of this job so I could process a heavy load of
>> >>> >>messages?
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> Thanks,
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]>
>> >>> wrote:
>> >>> >> >>>>>>>
>> >>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per
>> >>> second?"*
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> I believe @Chris has better answer about this.
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> *"I have one job that get this messages and another that
>> >>>reads
>> >>> >> >>>>> from
>> >>> >> >>>>>>> the
>> >>> >> >>>>>>>> output of the first job that does some more
>>processing."*
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>   Why not use one job get messages and process them?
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> *" when I change a*
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it
>>and
>> >>> >>send
>> >>> >> >>>>> the
>> >>> >> >>>>>>> new
>> >>> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config
>>and
>> >>>it
>> >>> >> >>>>> should
>> >>> >> >>>>>>> work."*
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>   No, you don't need to recompile. Change the config and
>> >>> >> >>>>> run-job. It
>> >>> >> >>>>>>> will
>> >>> >> >>>>>>>> work.
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> Thanks.
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> Cheers,
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> Fang, Yan
>> >>> >> >>>>>>>> [email protected]
>> >>> >> >>>>>>>> +1 (206) 849-4108
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega
>> >>> >> >>>>>>> <[email protected]
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> wrote:
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>> Not completely related to the topic of the question but
>> >>>when
>> >>> I
>> >>> >> >>>>>>> change a
>> >>> >> >>>>>>>>> configuration of one my jobs do I need to recompile it
>>and
>> >>> >>send
>> >>> >> >>>>> the
>> >>> >> >>>>>>> new
>> >>> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config
>>and
>> >>>it
>> >>> >> >>>>> should
>> >>> >> >>>>>>> work.
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> Thanks
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega <
>> >>> >> >>>>>>> [email protected]>
>> >>> >> >>>>>>>>> wrote:
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run
>> >>>samza
>> >>> >>with
>> >>> >> >>>>>>>>> different
>> >>> >> >>>>>>>>>> input rates. First I'm running with 420
>>messages/second
>> >>>and
>> >>> I
>> >>> >> >>>>> scale
>> >>> >> >>>>>>> up
>> >>> >> >>>>>>> to
>> >>> >> >>>>>>>>>> 33200 messages/second.
>> >>> >> >>>>>>>>>>
>> >>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per
>> >>>second?
>> >>> >> >>>>>>>>>> Second, what is the best way to read into samza this
>>much
>> >>> >> >>>>> messages?
>> >>> >> >>>>>>> I
>> >>> >> >>>>>>>>> have
>> >>> >> >>>>>>>>>> one job that get this messages and another that reads
>> >>>from
>> >>> >>the
>> >>> >> >>>>>>> output
>> >>> >> >>>>>>> of
>> >>> >> >>>>>>>>>> the first job that does some more processing. Is the
>>best
>> >>> >>way to
>> >>> >> >>>>> use
>> >>> >> >>>>>>> more
>> >>> >> >>>>>>>>>> containers and split kafka topics in partitions (the
>>same
>> >>> >> >>>>> number of
>> >>> >> >>>>>>>>>> containers) or is there a better way to do this.
>> >>> >> >>>>>>>>>>
>> >>> >> >>>>>>>>>> Thanks in advance,
>> >>> >> >>>>>>>>>>
>> >>> >> >>>>>>>>>> --
>> >>> >> >>>>>>>>>> ------------------------------------------
>> >>> >> >>>>>>>>>> Telles Mota Vidal Nobrega
>> >>> >> >>>>>>>>>> M.sc. Candidate at UFCG
>> >>> >> >>>>>>>>>> B.sc. in Computer Science at UFCG
>> >>> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>> >> >>>>>>>>>>
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> --
>> >>> >> >>>>>>>>> ------------------------------------------
>> >>> >> >>>>>>>>> Telles Mota Vidal Nobrega
>> >>> >> >>>>>>>>> M.sc. Candidate at UFCG
>> >>> >> >>>>>>>>> B.sc. in Computer Science at UFCG
>> >>> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>
>> >>> >> >>>>>>>
>> >>> >> >>>>>>
>> >>> >> >>>>>>
>> >>> >> >>>>>> --
>> >>> >> >>>>>> ------------------------------------------
>> >>> >> >>>>>> Telles Mota Vidal Nobrega
>> >>> >> >>>>>> M.sc. Candidate at UFCG
>> >>> >> >>>>>> B.sc. in Computer Science at UFCG
>> >>> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> --
>> >>> >> >>>> ------------------------------------------
>> >>> >> >>>> Telles Mota Vidal Nobrega
>> >>> >> >>>> M.sc. Candidate at UFCG
>> >>> >> >>>> B.sc. in Computer Science at UFCG
>> >>> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>> >> >>>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>> --
>> >>> >> >>> ------------------------------------------
>> >>> >> >>> Telles Mota Vidal Nobrega
>> >>> >> >>> M.sc. Candidate at UFCG
>> >>> >> >>> B.sc. in Computer Science at UFCG
>> >>> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >--
>> >>> >------------------------------------------
>> >>> >Telles Mota Vidal Nobrega
>> >>> >M.sc. Candidate at UFCG
>> >>> >B.sc. in Computer Science at UFCG
>> >>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> ------------------------------------------
>> >> Telles Mota Vidal Nobrega
>> >> M.sc. Candidate at UFCG
>> >> B.sc. in Computer Science at UFCG
>> >> Software Engineer at OpenStack Project - HP/LSD-UFCG
>> >>
>> >
>> >
>> >
>> >--
>> >------------------------------------------
>> >Telles Mota Vidal Nobrega
>> >M.sc. Candidate at UFCG
>> >B.sc. in Computer Science at UFCG
>> >Software Engineer at OpenStack Project - HP/LSD-UFCG
>>
>>
>
>
>-- 
>------------------------------------------
>Telles Mota Vidal Nobrega
>M.sc. Candidate at UFCG
>B.sc. in Computer Science at UFCG
>Software Engineer at OpenStack Project - HP/LSD-UFCG

Re: How to run multiple instances of the same job

Reply via email to