I see. Thanks. Weird thing is it works some rounds and than stops.
On Mon, Aug 18, 2014 at 1:44 PM, Chris Riccomini < [email protected]> wrote: > Hey Telles, > > The problem could occur with HDFS. I believe that LOCALIZING just means > that the NM is trying to download the artifact from wherever it is (be > that HTTP, HDFS, etc). > > Cheers, > Chris > > On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote: > > >Chris, > > > >I'm using HDFS, I will run again and see if the problem happens and I will > >post if i find any problem or have more questions. > > > >Thanks. > > > > > >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini < > >[email protected]> wrote: > > > >> Hey Telles, > >> > >> Usually, when a job is stuck in LOCALIZING, it means that YARN is > >> struggling to distribute your binary (the .tgz) to the appropriate > >> NodeManagers, I think. You should check your NM logs and see if there > >>are > >> any hints about what's going on there. > >> > >> I've seen this in the past when the NM hangs trying to download a .tgz > >> from the HTTP server for some reason. > >> > >> Cheers, > >> Chris > >> > >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> wrote: > >> > >> >I was able to fix this problem, now I¹m having another one. I¹m using a > >> >script that starts kafka, deploys samza jobs, stop them, kills kafka > >>and > >> >delete configurations in zookeeper and kafka-log files. Them start over > >> >again. I see that sometimes jobs don¹t start running, they stay in > >> >accepted state with info LOCALIZING, what can be the cause for that? > >> > > >> >Thanks. > >> >On 15 Aug 2014, at 19:18, Chris Riccomini > >> ><[email protected]> wrote: > >> > > >> >> Hey Telles, > >> >> > >> >> If you set yarn.container.count to 5, you should get 5 containers. > >>The > >> >>two > >> >> cases where you don't are: > >> >> > >> >> 1. The grid is at capacity, and doesn't have the memory to fulfill > >>all > >> >> container requests. > >> >> 2. You set yarn.container.count higher than the number of partitions > >> >>that > >> >> your input stream has. > >> >> > >> >> Cheers, > >> >> Chris > >> >> > >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> > wrote: > >> >> > >> >>> Hi Chris, > >> >>> > >> >>> I started playing with the yarn.container.count and set it to 5. > >> >>> > >> >>> At first I thought I had to compile the package again and republish > >>to > >> >>> hdfs > >> >>> because I couldn't run 5 containers. > >> >>> Then I recompiled but I still only got 3 containers, is that normal > >> >>> behaviour? > >> >>> > >> >>> Thanks. > >> >>> > >> >>> > >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega > >> >>><[email protected]> > >> >>> wrote: > >> >>> > >> >>>> Thanks Chris, i will take a look at this links and I will come back > >> >>>>if I > >> >>>> have more questions. > >> >>>> > >> >>>> > >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < > >> >>>> [email protected]> wrote: > >> >>>> > >> >>>>> Hey Telles, > >> >>>>> > >> >>>>>>> Should I use many kafka brokers or one will suffice? > >> >>>>> > >> >>>>> The number of brokers you use is dependent on the number of > >> >>>>> messages/sec > >> >>>>> you're going to receive, the size of those messages, and how long > >> >>>>> you're > >> >>>>> going to retain them. > >> >>>>> > >> >>>>> Here is a good blog post on Kafka performance that should give you > >> >>>>>some > >> >>>>> idea of the numbers: > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil > >> >>>>>li > >> >>>>> on- > >> >>>>> writes-second-three-cheap-machines > >> >>>>> > >> >>>>> > >> >>>>>< > >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi > >> >>>>>ll > >> >>>>> ion-writes-second-three-cheap-machines> > >> >>>>> > >> >>>>>>> It could be just one job, but what is the best way to deploy > >>many > >> >>>>>>> instances of this job so I could process a heavy load of > >>messages? > >> >>>>> > >> >>>>> You should adjust the yarn.container.count to increase the > >> >>>>>parallelism > >> >>>>> of > >> >>>>> your job. By default, you get one container, but you can adjust > >>this > >> >>>>> up to > >> >>>>> the total number of input partitions that you have. Have a look > >>here > >> >>>>> for > >> >>>>> some details about how Samza's parallelism works: > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti > >> >>>>>on > >> >>>>> /co > >> >>>>> ncepts.html > >> >>>>> > >> >>>>> > >> >>>>>< > >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct > >> >>>>>io > >> >>>>> n/concepts.html> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> Cheers, > >> >>>>> Chris > >> >>>>> > >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> > >> wrote: > >> >>>>> > >> >>>>>> Should I use many kafka brokers or one will sufice? > >> >>>>>> > >> >>>>>> Thanks > >> >>>>>> > >> >>>>>> > >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega > >> >>>>> <[email protected] > >> >>>>>> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>>> It could be just one job, but what is the best way to deploy > >>many > >> >>>>>>> instances of this job so I could process a heavy load of > >>messages? > >> >>>>>>> > >> >>>>>>> Thanks, > >> >>>>>>> > >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> > wrote: > >> >>>>>>> > >> >>>>>>>> *"Does one kafka-broker handle this much messages per second?"* > >> >>>>>>>> > >> >>>>>>>> I believe @Chris has better answer about this. > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> *"I have one job that get this messages and another that reads > >> >>>>> from > >> >>>>>>> the > >> >>>>>>>> output of the first job that does some more processing."* > >> >>>>>>>> > >> >>>>>>>> Why not use one job get messages and process them? > >> >>>>>>>> > >> >>>>>>>> *" when I change a* > >> >>>>>>>> > >> >>>>>>>> *configuration of one my jobs do I need to recompile it and > >>send > >> >>>>> the > >> >>>>>>> new > >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it > >> >>>>> should > >> >>>>>>> work."* > >> >>>>>>>> > >> >>>>>>>> No, you don't need to recompile. Change the config and > >> >>>>> run-job. It > >> >>>>>>> will > >> >>>>>>>> work. > >> >>>>>>>> > >> >>>>>>>> Thanks. > >> >>>>>>>> > >> >>>>>>>> Cheers, > >> >>>>>>>> > >> >>>>>>>> Fang, Yan > >> >>>>>>>> [email protected] > >> >>>>>>>> +1 (206) 849-4108 > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega > >> >>>>>>> <[email protected] > >> >>>>>>>> > >> >>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>>> Not completely related to the topic of the question but when I > >> >>>>>>> change a > >> >>>>>>>>> configuration of one my jobs do I need to recompile it and > >>send > >> >>>>> the > >> >>>>>>> new > >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it > >> >>>>> should > >> >>>>>>> work. > >> >>>>>>>>> > >> >>>>>>>>> Thanks > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < > >> >>>>>>> [email protected]> > >> >>>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza > >>with > >> >>>>>>>>> different > >> >>>>>>>>>> input rates. First I'm running with 420 messages/second and I > >> >>>>> scale > >> >>>>>>> up > >> >>>>>>> to > >> >>>>>>>>>> 33200 messages/second. > >> >>>>>>>>>> > >> >>>>>>>>>> Does one kafka-broker handle this much messages per second? > >> >>>>>>>>>> Second, what is the best way to read into samza this much > >> >>>>> messages? > >> >>>>>>> I > >> >>>>>>>>> have > >> >>>>>>>>>> one job that get this messages and another that reads from > >>the > >> >>>>>>> output > >> >>>>>>> of > >> >>>>>>>>>> the first job that does some more processing. Is the best > >>way to > >> >>>>> use > >> >>>>>>> more > >> >>>>>>>>>> containers and split kafka topics in partitions (the same > >> >>>>> number of > >> >>>>>>>>>> containers) or is there a better way to do this. > >> >>>>>>>>>> > >> >>>>>>>>>> Thanks in advance, > >> >>>>>>>>>> > >> >>>>>>>>>> -- > >> >>>>>>>>>> ------------------------------------------ > >> >>>>>>>>>> Telles Mota Vidal Nobrega > >> >>>>>>>>>> M.sc. Candidate at UFCG > >> >>>>>>>>>> B.sc. in Computer Science at UFCG > >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -- > >> >>>>>>>>> ------------------------------------------ > >> >>>>>>>>> Telles Mota Vidal Nobrega > >> >>>>>>>>> M.sc. Candidate at UFCG > >> >>>>>>>>> B.sc. in Computer Science at UFCG > >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>>>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> ------------------------------------------ > >> >>>>>> Telles Mota Vidal Nobrega > >> >>>>>> M.sc. Candidate at UFCG > >> >>>>>> B.sc. in Computer Science at UFCG > >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>>>> > >> >>>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> ------------------------------------------ > >> >>>> Telles Mota Vidal Nobrega > >> >>>> M.sc. Candidate at UFCG > >> >>>> B.sc. in Computer Science at UFCG > >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> >>>> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> ------------------------------------------ > >> >>> Telles Mota Vidal Nobrega > >> >>> M.sc. Candidate at UFCG > >> >>> B.sc. in Computer Science at UFCG > >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG > >> > > >> > >> > > > > > >-- > >------------------------------------------ > >Telles Mota Vidal Nobrega > >M.sc. Candidate at UFCG > >B.sc. in Computer Science at UFCG > >Software Engineer at OpenStack Project - HP/LSD-UFCG > > -- ------------------------------------------ Telles Mota Vidal Nobrega M.sc. Candidate at UFCG B.sc. in Computer Science at UFCG Software Engineer at OpenStack Project - HP/LSD-UFCG
