Chris, is there a way to eliminate completely buffering in samza + kafka?
On Mon, Aug 18, 2014 at 1:46 PM, Telles Nobrega <[email protected]> wrote: > I see. Thanks. Weird thing is it works some rounds and than stops. > > > On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini < > [email protected]> wrote: > >> Hey Telles, >> >> The problem could occur with HDFS. I believe that LOCALIZING just means >> that the NM is trying to download the artifact from wherever it is (be >> that HTTP, HDFS, etc). >> >> Cheers, >> Chris >> >> On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote: >> >> >Chris, >> > >> >I'm using HDFS, I will run again and see if the problem happens and I >> will >> >post if i find any problem or have more questions. >> > >> >Thanks. >> > >> > >> >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini < >> >[email protected]> wrote: >> > >> >> Hey Telles, >> >> >> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is >> >> struggling to distribute your binary (the .tgz) to the appropriate >> >> NodeManagers, I think. You should check your NM logs and see if there >> >>are >> >> any hints about what's going on there. >> >> >> >> I've seen this in the past when the NM hangs trying to download a .tgz >> >> from the HTTP server for some reason. >> >> >> >> Cheers, >> >> Chris >> >> >> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> wrote: >> >> >> >> >I was able to fix this problem, now I¹m having another one. I¹m using >> a >> >> >script that starts kafka, deploys samza jobs, stop them, kills kafka >> >>and >> >> >delete configurations in zookeeper and kafka-log files. Them start >> over >> >> >again. I see that sometimes jobs don¹t start running, they stay in >> >> >accepted state with info LOCALIZING, what can be the cause for that? >> >> > >> >> >Thanks. >> >> >On 15 Aug 2014, at 19:18, Chris Riccomini >> >> ><[email protected]> wrote: >> >> > >> >> >> Hey Telles, >> >> >> >> >> >> If you set yarn.container.count to 5, you should get 5 containers. >> >>The >> >> >>two >> >> >> cases where you don't are: >> >> >> >> >> >> 1. The grid is at capacity, and doesn't have the memory to fulfill >> >>all >> >> >> container requests. >> >> >> 2. You set yarn.container.count higher than the number of partitions >> >> >>that >> >> >> your input stream has. >> >> >> >> >> >> Cheers, >> >> >> Chris >> >> >> >> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> >> wrote: >> >> >> >> >> >>> Hi Chris, >> >> >>> >> >> >>> I started playing with the yarn.container.count and set it to 5. >> >> >>> >> >> >>> At first I thought I had to compile the package again and republish >> >>to >> >> >>> hdfs >> >> >>> because I couldn't run 5 containers. >> >> >>> Then I recompiled but I still only got 3 containers, is that normal >> >> >>> behaviour? >> >> >>> >> >> >>> Thanks. >> >> >>> >> >> >>> >> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega >> >> >>><[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> Thanks Chris, i will take a look at this links and I will come >> back >> >> >>>>if I >> >> >>>> have more questions. >> >> >>>> >> >> >>>> >> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < >> >> >>>> [email protected]> wrote: >> >> >>>> >> >> >>>>> Hey Telles, >> >> >>>>> >> >> >>>>>>> Should I use many kafka brokers or one will suffice? >> >> >>>>> >> >> >>>>> The number of brokers you use is dependent on the number of >> >> >>>>> messages/sec >> >> >>>>> you're going to receive, the size of those messages, and how long >> >> >>>>> you're >> >> >>>>> going to retain them. >> >> >>>>> >> >> >>>>> Here is a good blog post on Kafka performance that should give >> you >> >> >>>>>some >> >> >>>>> idea of the numbers: >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil >> >> >>>>>li >> >> >>>>> on- >> >> >>>>> writes-second-three-cheap-machines >> >> >>>>> >> >> >>>>> >> >> >>>>>< >> >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi >> >> >>>>>ll >> >> >>>>> ion-writes-second-three-cheap-machines> >> >> >>>>> >> >> >>>>>>> It could be just one job, but what is the best way to deploy >> >>many >> >> >>>>>>> instances of this job so I could process a heavy load of >> >>messages? >> >> >>>>> >> >> >>>>> You should adjust the yarn.container.count to increase the >> >> >>>>>parallelism >> >> >>>>> of >> >> >>>>> your job. By default, you get one container, but you can adjust >> >>this >> >> >>>>> up to >> >> >>>>> the total number of input partitions that you have. Have a look >> >>here >> >> >>>>> for >> >> >>>>> some details about how Samza's parallelism works: >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti >> >> >>>>>on >> >> >>>>> /co >> >> >>>>> ncepts.html >> >> >>>>> >> >> >>>>> >> >> >>>>>< >> >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct >> >> >>>>>io >> >> >>>>> n/concepts.html> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> Cheers, >> >> >>>>> Chris >> >> >>>>> >> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> >> >> wrote: >> >> >>>>> >> >> >>>>>> Should I use many kafka brokers or one will sufice? >> >> >>>>>> >> >> >>>>>> Thanks >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega >> >> >>>>> <[email protected] >> >> >>>>>> >> >> >>>>>> wrote: >> >> >>>>>> >> >> >>>>>>> It could be just one job, but what is the best way to deploy >> >>many >> >> >>>>>>> instances of this job so I could process a heavy load of >> >>messages? >> >> >>>>>>> >> >> >>>>>>> Thanks, >> >> >>>>>>> >> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> >> wrote: >> >> >>>>>>> >> >> >>>>>>>> *"Does one kafka-broker handle this much messages per >> second?"* >> >> >>>>>>>> >> >> >>>>>>>> I believe @Chris has better answer about this. >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> *"I have one job that get this messages and another that reads >> >> >>>>> from >> >> >>>>>>> the >> >> >>>>>>>> output of the first job that does some more processing."* >> >> >>>>>>>> >> >> >>>>>>>> Why not use one job get messages and process them? >> >> >>>>>>>> >> >> >>>>>>>> *" when I change a* >> >> >>>>>>>> >> >> >>>>>>>> *configuration of one my jobs do I need to recompile it and >> >>send >> >> >>>>> the >> >> >>>>>>> new >> >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it >> >> >>>>> should >> >> >>>>>>> work."* >> >> >>>>>>>> >> >> >>>>>>>> No, you don't need to recompile. Change the config and >> >> >>>>> run-job. It >> >> >>>>>>> will >> >> >>>>>>>> work. >> >> >>>>>>>> >> >> >>>>>>>> Thanks. >> >> >>>>>>>> >> >> >>>>>>>> Cheers, >> >> >>>>>>>> >> >> >>>>>>>> Fang, Yan >> >> >>>>>>>> [email protected] >> >> >>>>>>>> +1 (206) 849-4108 >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega >> >> >>>>>>> <[email protected] >> >> >>>>>>>> >> >> >>>>>>>> wrote: >> >> >>>>>>>> >> >> >>>>>>>>> Not completely related to the topic of the question but when >> I >> >> >>>>>>> change a >> >> >>>>>>>>> configuration of one my jobs do I need to recompile it and >> >>send >> >> >>>>> the >> >> >>>>>>> new >> >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it >> >> >>>>> should >> >> >>>>>>> work. >> >> >>>>>>>>> >> >> >>>>>>>>> Thanks >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < >> >> >>>>>>> [email protected]> >> >> >>>>>>>>> wrote: >> >> >>>>>>>>> >> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza >> >>with >> >> >>>>>>>>> different >> >> >>>>>>>>>> input rates. First I'm running with 420 messages/second and >> I >> >> >>>>> scale >> >> >>>>>>> up >> >> >>>>>>> to >> >> >>>>>>>>>> 33200 messages/second. >> >> >>>>>>>>>> >> >> >>>>>>>>>> Does one kafka-broker handle this much messages per second? >> >> >>>>>>>>>> Second, what is the best way to read into samza this much >> >> >>>>> messages? >> >> >>>>>>> I >> >> >>>>>>>>> have >> >> >>>>>>>>>> one job that get this messages and another that reads from >> >>the >> >> >>>>>>> output >> >> >>>>>>> of >> >> >>>>>>>>>> the first job that does some more processing. Is the best >> >>way to >> >> >>>>> use >> >> >>>>>>> more >> >> >>>>>>>>>> containers and split kafka topics in partitions (the same >> >> >>>>> number of >> >> >>>>>>>>>> containers) or is there a better way to do this. >> >> >>>>>>>>>> >> >> >>>>>>>>>> Thanks in advance, >> >> >>>>>>>>>> >> >> >>>>>>>>>> -- >> >> >>>>>>>>>> ------------------------------------------ >> >> >>>>>>>>>> Telles Mota Vidal Nobrega >> >> >>>>>>>>>> M.sc. Candidate at UFCG >> >> >>>>>>>>>> B.sc. in Computer Science at UFCG >> >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> >>>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> >> >> >>>>>>>>> -- >> >> >>>>>>>>> ------------------------------------------ >> >> >>>>>>>>> Telles Mota Vidal Nobrega >> >> >>>>>>>>> M.sc. Candidate at UFCG >> >> >>>>>>>>> B.sc. in Computer Science at UFCG >> >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> >>>>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> -- >> >> >>>>>> ------------------------------------------ >> >> >>>>>> Telles Mota Vidal Nobrega >> >> >>>>>> M.sc. Candidate at UFCG >> >> >>>>>> B.sc. in Computer Science at UFCG >> >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> >>>>> >> >> >>>>> >> >> >>>> >> >> >>>> >> >> >>>> -- >> >> >>>> ------------------------------------------ >> >> >>>> Telles Mota Vidal Nobrega >> >> >>>> M.sc. Candidate at UFCG >> >> >>>> B.sc. in Computer Science at UFCG >> >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> >>>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> ------------------------------------------ >> >> >>> Telles Mota Vidal Nobrega >> >> >>> M.sc. Candidate at UFCG >> >> >>> B.sc. in Computer Science at UFCG >> >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> > >> >> >> >> >> > >> > >> >-- >> >------------------------------------------ >> >Telles Mota Vidal Nobrega >> >M.sc. Candidate at UFCG >> >B.sc. in Computer Science at UFCG >> >Software Engineer at OpenStack Project - HP/LSD-UFCG >> >> > > > -- > ------------------------------------------ > Telles Mota Vidal Nobrega > M.sc. Candidate at UFCG > B.sc. in Computer Science at UFCG > Software Engineer at OpenStack Project - HP/LSD-UFCG > -- ------------------------------------------ Telles Mota Vidal Nobrega M.sc. Candidate at UFCG B.sc. in Computer Science at UFCG Software Engineer at OpenStack Project - HP/LSD-UFCG
