Hi Chris, I started playing with the yarn.container.count and set it to 5.
At first I thought I had to compile the package again and republish to hdfs because I couldn't run 5 containers. Then I recompiled but I still only got 3 containers, is that normal behaviour? Thanks. On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega <[email protected]> wrote: > Thanks Chris, i will take a look at this links and I will come back if I > have more questions. > > > On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < > [email protected]> wrote: > >> Hey Telles, >> >> >> Should I use many kafka brokers or one will suffice? >> >> The number of brokers you use is dependent on the number of messages/sec >> you're going to receive, the size of those messages, and how long you're >> going to retain them. >> >> Here is a good blog post on Kafka performance that should give you some >> idea of the numbers: >> >> >> >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million- >> writes-second-three-cheap-machines >> <https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines> >> >> >> It could be just one job, but what is the best way to deploy many >> >>instances of this job so I could process a heavy load of messages? >> >> You should adjust the yarn.container.count to increase the parallelism of >> your job. By default, you get one container, but you can adjust this up to >> the total number of input partitions that you have. Have a look here for >> some details about how Samza's parallelism works: >> >> >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/co >> ncepts.html >> <http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/concepts.html> >> >> >> >> >> Cheers, >> Chris >> >> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> wrote: >> >> >Should I use many kafka brokers or one will sufice? >> > >> >Thanks >> > >> > >> >On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega <[email protected] >> > >> >wrote: >> > >> >> It could be just one job, but what is the best way to deploy many >> >> instances of this job so I could process a heavy load of messages? >> >> >> >> Thanks, >> >> >> >> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote: >> >> >> >> > *"Does one kafka-broker handle this much messages per second?"* >> >> > >> >> > I believe @Chris has better answer about this. >> >> > >> >> > >> >> > >> >> > *"I have one job that get this messages and another that reads from >> >>the >> >> > output of the first job that does some more processing."* >> >> > >> >> > Why not use one job get messages and process them? >> >> > >> >> > *" when I change a* >> >> > >> >> > *configuration of one my jobs do I need to recompile it and send the >> >>new >> >> > tar.gz to hdfs or just change the deploy/samza config and it should >> >> work."* >> >> > >> >> > No, you don't need to recompile. Change the config and run-job. It >> >> will >> >> > work. >> >> > >> >> > Thanks. >> >> > >> >> > Cheers, >> >> > >> >> > Fang, Yan >> >> > [email protected] >> >> > +1 (206) 849-4108 >> >> > >> >> > >> >> > On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega >> >><[email protected] >> >> > >> >> > wrote: >> >> > >> >> >> Not completely related to the topic of the question but when I >> >>change a >> >> >> configuration of one my jobs do I need to recompile it and send the >> >>new >> >> >> tar.gz to hdfs or just change the deploy/samza config and it should >> >> work. >> >> >> >> >> >> Thanks >> >> >> >> >> >> >> >> >> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < >> >> [email protected]> >> >> >> wrote: >> >> >> >> >> >>> Hi, I'm running an experiment that I'm suppose to run samza with >> >> >> different >> >> >>> input rates. First I'm running with 420 messages/second and I scale >> >>up >> >> to >> >> >>> 33200 messages/second. >> >> >>> >> >> >>> Does one kafka-broker handle this much messages per second? >> >> >>> Second, what is the best way to read into samza this much messages? >> >>I >> >> >> have >> >> >>> one job that get this messages and another that reads from the >> >>output >> >> of >> >> >>> the first job that does some more processing. Is the best way to >> use >> >> more >> >> >>> containers and split kafka topics in partitions (the same number of >> >> >>> containers) or is there a better way to do this. >> >> >>> >> >> >>> Thanks in advance, >> >> >>> >> >> >>> -- >> >> >>> ------------------------------------------ >> >> >>> Telles Mota Vidal Nobrega >> >> >>> M.sc. Candidate at UFCG >> >> >>> B.sc. in Computer Science at UFCG >> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> ------------------------------------------ >> >> >> Telles Mota Vidal Nobrega >> >> >> M.sc. Candidate at UFCG >> >> >> B.sc. in Computer Science at UFCG >> >> >> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> >> >> >> >> >> >> > >> > >> >-- >> >------------------------------------------ >> >Telles Mota Vidal Nobrega >> >M.sc. Candidate at UFCG >> >B.sc. in Computer Science at UFCG >> >Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> > > > -- > ------------------------------------------ > Telles Mota Vidal Nobrega > M.sc. Candidate at UFCG > B.sc. in Computer Science at UFCG > Software Engineer at OpenStack Project - HP/LSD-UFCG > -- ------------------------------------------ Telles Mota Vidal Nobrega M.sc. Candidate at UFCG B.sc. in Computer Science at UFCG Software Engineer at OpenStack Project - HP/LSD-UFCG
