Hey Telles, The problem could occur with HDFS. I believe that LOCALIZING just means that the NM is trying to download the artifact from wherever it is (be that HTTP, HDFS, etc).
Cheers, Chris On 8/18/14 9:22 AM, "Telles Nobrega" <[email protected]> wrote: >Chris, > >I'm using HDFS, I will run again and see if the problem happens and I will >post if i find any problem or have more questions. > >Thanks. > > >On Mon, Aug 18, 2014 at 12:45 PM, Chris Riccomini < >[email protected]> wrote: > >> Hey Telles, >> >> Usually, when a job is stuck in LOCALIZING, it means that YARN is >> struggling to distribute your binary (the .tgz) to the appropriate >> NodeManagers, I think. You should check your NM logs and see if there >>are >> any hints about what's going on there. >> >> I've seen this in the past when the NM hangs trying to download a .tgz >> from the HTTP server for some reason. >> >> Cheers, >> Chris >> >> On 8/16/14 10:41 PM, "Telles Nobrega" <[email protected]> wrote: >> >> >I was able to fix this problem, now I¹m having another one. I¹m using a >> >script that starts kafka, deploys samza jobs, stop them, kills kafka >>and >> >delete configurations in zookeeper and kafka-log files. Them start over >> >again. I see that sometimes jobs don¹t start running, they stay in >> >accepted state with info LOCALIZING, what can be the cause for that? >> > >> >Thanks. >> >On 15 Aug 2014, at 19:18, Chris Riccomini >> ><[email protected]> wrote: >> > >> >> Hey Telles, >> >> >> >> If you set yarn.container.count to 5, you should get 5 containers. >>The >> >>two >> >> cases where you don't are: >> >> >> >> 1. The grid is at capacity, and doesn't have the memory to fulfill >>all >> >> container requests. >> >> 2. You set yarn.container.count higher than the number of partitions >> >>that >> >> your input stream has. >> >> >> >> Cheers, >> >> Chris >> >> >> >> On 8/15/14 1:56 PM, "Telles Nobrega" <[email protected]> wrote: >> >> >> >>> Hi Chris, >> >>> >> >>> I started playing with the yarn.container.count and set it to 5. >> >>> >> >>> At first I thought I had to compile the package again and republish >>to >> >>> hdfs >> >>> because I couldn't run 5 containers. >> >>> Then I recompiled but I still only got 3 containers, is that normal >> >>> behaviour? >> >>> >> >>> Thanks. >> >>> >> >>> >> >>> On Wed, Aug 13, 2014 at 5:00 PM, Telles Nobrega >> >>><[email protected]> >> >>> wrote: >> >>> >> >>>> Thanks Chris, i will take a look at this links and I will come back >> >>>>if I >> >>>> have more questions. >> >>>> >> >>>> >> >>>> On Wed, Aug 13, 2014 at 4:33 PM, Chris Riccomini < >> >>>> [email protected]> wrote: >> >>>> >> >>>>> Hey Telles, >> >>>>> >> >>>>>>> Should I use many kafka brokers or one will suffice? >> >>>>> >> >>>>> The number of brokers you use is dependent on the number of >> >>>>> messages/sec >> >>>>> you're going to receive, the size of those messages, and how long >> >>>>> you're >> >>>>> going to retain them. >> >>>>> >> >>>>> Here is a good blog post on Kafka performance that should give you >> >>>>>some >> >>>>> idea of the numbers: >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mil >> >>>>>li >> >>>>> on- >> >>>>> writes-second-three-cheap-machines >> >>>>> >> >>>>> >> >>>>>< >> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-mi >> >>>>>ll >> >>>>> ion-writes-second-three-cheap-machines> >> >>>>> >> >>>>>>> It could be just one job, but what is the best way to deploy >>many >> >>>>>>> instances of this job so I could process a heavy load of >>messages? >> >>>>> >> >>>>> You should adjust the yarn.container.count to increase the >> >>>>>parallelism >> >>>>> of >> >>>>> your job. By default, you get one container, but you can adjust >>this >> >>>>> up to >> >>>>> the total number of input partitions that you have. Have a look >>here >> >>>>> for >> >>>>> some details about how Samza's parallelism works: >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introducti >> >>>>>on >> >>>>> /co >> >>>>> ncepts.html >> >>>>> >> >>>>> >> >>>>>< >> http://samza.incubator.apache.org/learn/documentation/0.7.0/introduct >> >>>>>io >> >>>>> n/concepts.html> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> Cheers, >> >>>>> Chris >> >>>>> >> >>>>> On 8/13/14 9:37 AM, "Telles Nobrega" <[email protected]> >> wrote: >> >>>>> >> >>>>>> Should I use many kafka brokers or one will sufice? >> >>>>>> >> >>>>>> Thanks >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Aug 13, 2014 at 7:24 AM, Telles Nobrega >> >>>>> <[email protected] >> >>>>>> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> It could be just one job, but what is the best way to deploy >>many >> >>>>>>> instances of this job so I could process a heavy load of >>messages? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> On 13 Aug 2014, at 01:39, Yan Fang <[email protected]> wrote: >> >>>>>>> >> >>>>>>>> *"Does one kafka-broker handle this much messages per second?"* >> >>>>>>>> >> >>>>>>>> I believe @Chris has better answer about this. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> *"I have one job that get this messages and another that reads >> >>>>> from >> >>>>>>> the >> >>>>>>>> output of the first job that does some more processing."* >> >>>>>>>> >> >>>>>>>> Why not use one job get messages and process them? >> >>>>>>>> >> >>>>>>>> *" when I change a* >> >>>>>>>> >> >>>>>>>> *configuration of one my jobs do I need to recompile it and >>send >> >>>>> the >> >>>>>>> new >> >>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it >> >>>>> should >> >>>>>>> work."* >> >>>>>>>> >> >>>>>>>> No, you don't need to recompile. Change the config and >> >>>>> run-job. It >> >>>>>>> will >> >>>>>>>> work. >> >>>>>>>> >> >>>>>>>> Thanks. >> >>>>>>>> >> >>>>>>>> Cheers, >> >>>>>>>> >> >>>>>>>> Fang, Yan >> >>>>>>>> [email protected] >> >>>>>>>> +1 (206) 849-4108 >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Tue, Aug 12, 2014 at 8:47 PM, Telles Nobrega >> >>>>>>> <[email protected] >> >>>>>>>> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Not completely related to the topic of the question but when I >> >>>>>>> change a >> >>>>>>>>> configuration of one my jobs do I need to recompile it and >>send >> >>>>> the >> >>>>>>> new >> >>>>>>>>> tar.gz to hdfs or just change the deploy/samza config and it >> >>>>> should >> >>>>>>> work. >> >>>>>>>>> >> >>>>>>>>> Thanks >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Tue, Aug 12, 2014 at 11:23 PM, Telles Nobrega < >> >>>>>>> [email protected]> >> >>>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Hi, I'm running an experiment that I'm suppose to run samza >>with >> >>>>>>>>> different >> >>>>>>>>>> input rates. First I'm running with 420 messages/second and I >> >>>>> scale >> >>>>>>> up >> >>>>>>> to >> >>>>>>>>>> 33200 messages/second. >> >>>>>>>>>> >> >>>>>>>>>> Does one kafka-broker handle this much messages per second? >> >>>>>>>>>> Second, what is the best way to read into samza this much >> >>>>> messages? >> >>>>>>> I >> >>>>>>>>> have >> >>>>>>>>>> one job that get this messages and another that reads from >>the >> >>>>>>> output >> >>>>>>> of >> >>>>>>>>>> the first job that does some more processing. Is the best >>way to >> >>>>> use >> >>>>>>> more >> >>>>>>>>>> containers and split kafka topics in partitions (the same >> >>>>> number of >> >>>>>>>>>> containers) or is there a better way to do this. >> >>>>>>>>>> >> >>>>>>>>>> Thanks in advance, >> >>>>>>>>>> >> >>>>>>>>>> -- >> >>>>>>>>>> ------------------------------------------ >> >>>>>>>>>> Telles Mota Vidal Nobrega >> >>>>>>>>>> M.sc. Candidate at UFCG >> >>>>>>>>>> B.sc. in Computer Science at UFCG >> >>>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> ------------------------------------------ >> >>>>>>>>> Telles Mota Vidal Nobrega >> >>>>>>>>> M.sc. Candidate at UFCG >> >>>>>>>>> B.sc. in Computer Science at UFCG >> >>>>>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >>>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> ------------------------------------------ >> >>>>>> Telles Mota Vidal Nobrega >> >>>>>> M.sc. Candidate at UFCG >> >>>>>> B.sc. in Computer Science at UFCG >> >>>>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> ------------------------------------------ >> >>>> Telles Mota Vidal Nobrega >> >>>> M.sc. Candidate at UFCG >> >>>> B.sc. in Computer Science at UFCG >> >>>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> ------------------------------------------ >> >>> Telles Mota Vidal Nobrega >> >>> M.sc. Candidate at UFCG >> >>> B.sc. in Computer Science at UFCG >> >>> Software Engineer at OpenStack Project - HP/LSD-UFCG >> > >> >> > > >-- >------------------------------------------ >Telles Mota Vidal Nobrega >M.sc. Candidate at UFCG >B.sc. in Computer Science at UFCG >Software Engineer at OpenStack Project - HP/LSD-UFCG
