Thank you very much for the help. I am going to start working on it soon (a few days) and will probably have more questions :)
Eyal Golan egola...@gmail.com Visit: http://jvdrums.sourceforge.net/ LinkedIn: http://www.linkedin.com/in/egolan74 Skype: egolan74 P Save a tree. Please don't print this e-mail unless it's really necessary On Mon, Jan 2, 2012 at 2:01 AM, Anirudh <techie.anir...@gmail.com> wrote: > Any specific reason why setup is called for every task attempt. For > optimization point of view, wouldnt it be good if the setup is called only > once in case of JVM reuse. > I have not yet looked at the implementation, in case of JVM reuse is the > application Mapper instance reused or a new instance is created for every > task attempt? > > My suggestion for Eyal would be to have a static field initializer > expression in the Mapper to create the helper class instance. This will > ensure that the helper class will be instantiated when the Mapper class is > loaded. > > > > On Sun, Jan 1, 2012 at 7:05 AM, Harsh J <ha...@cloudera.com> wrote: > >> You are guaranteed one setup call for every single task attempt. This >> is regardless of JVM reuse being on or off. JVM reuse will cause no >> issues with what Eyal is attempting to do. >> >> On Sun, Jan 1, 2012 at 5:49 PM, Anirudh <techie.anir...@gmail.com> wrote: >> > No problems Eyal. >> > >> > On a second thought, for the JVM re-use the Mapper/Reducer instances >> should >> > be re-used, and the setup should be called only once. This makes sense >> too >> > as the JVM reuse is for the same job. >> > You should be good with class instantiation even if the JVM reuse is >> > enabled. >> > >> > >> > On Sat, Dec 31, 2011 at 11:39 PM, Eyal Golan <egola...@gmail.com> >> wrote: >> >> >> >> Thank you very much for the detailed explanation Anirudh. >> >> >> >> I think that my question about node / VM was due to some lack of >> knowledge >> >> (I'm just starting to learn the Hadoop environment). >> >> Regarding configuration of the nodes and clusters. >> >> This is something that I am not doing by myself. We have a dedicated >> team >> >> for managing the Hadoop cluster and I'll ask them. >> >> >> >> I think that my question should have been: How many instances of the >> >> 'helper' class will be created in a single VM. >> >> And, as I understand, consider I am creating the helper in the setup / >> >> configure method, there would be one. >> >> And as long as it's stateless, I'm good. >> >> >> >> Thanks again, >> >> >> >> Eyal >> >> >> >> >> >> >> >> Eyal Golan >> >> egola...@gmail.com >> >> >> >> Visit: http://jvdrums.sourceforge.net/ >> >> LinkedIn: http://www.linkedin.com/in/egolan74 >> >> Skype: egolan74 >> >> >> >> P Save a tree. Please don't print this e-mail unless it's really >> >> necessary >> >> >> >> >> >> >> >> On Sat, Dec 31, 2011 at 1:36 PM, Anirudh <techie.anir...@gmail.com> >> wrote: >> >>> >> >>> I just wanted to confirm where exactly you were planning to have the >> >>> instantiation code, as it was not mentioned in your previous post. The >> >>> location would have made difference. As you are doing it in the setup >> of >> >>> mapper/reducer, you are good. >> >>> >> >>> I was referring to the Task JVM Reuse option: >> >>> >> >>> >> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Task+JVM+Reuse >> >>> >> >>> It states that if the option to reuse JVM is enabled, the same Task >> JVM >> >>> will execute multiple tasks(i.e. map/reduce). I am not sure how this >> is >> >>> implemented, whether a new Mapper/Reducer is created for each task or >> they >> >>> too are re-reused. >> >>> If a new instance is created each time, then the mapper/reducer and >> all >> >>> its reference will be marked for garbage collection and you would be >> good. >> >>> If the Mapper/Reducer instances are re-used then the setup should be >> >>> called again creating another instance of your helper class. >> >>> >> >>> In my opinion the latter does not make sense, and the implementation >> >>> would be according to the prior approach i.e. creation of a new >> >>> Mapper/Reducer for each Task. But it would be interesting to check. >> >>> >> >>> As the classes in question are helper classes(stateless) you may not >> get >> >>> affected in terms of functionality. >> >>> >> >>> I am not clear on one of your statement: >> >>> >> >>> How many map tasks will be created? One per split or one per VM >> (node)? >> >>> Are you suggesting that although there would be one Mapper in the >> node... >> >>> >> >>> Have you configured your node to have a single slot for map/reduce >> task? >> >>> If yes then there will be one Mapper/Reducer task in the node. If no >> there >> >>> could be more than one mapper/reducer in the node depending on lots >> of other >> >>> paramerters i.e. no of mappers/reducers slots allocated on the node, >> no. of >> >>> input splits etc. If the node is configured to run more than one >> >>> Mapper/Reducer task the scheduler may choose to run more than one >> task on >> >>> the same node. The default is 2 Map & 2 Reduce tasks per node. And >> for each >> >>> task a new JVM is launched unless the JVM reuse option is enabled. >> >>> >> >>> Thanks, >> >>> Anirudh >> >>> >> >>> >> >>> On Sat, Dec 31, 2011 at 1:28 AM, Eyal Golan <egola...@gmail.com> >> wrote: >> >>>> >> >>>> My idea is to create that class in the setup / configure method >> (depends >> >>>> which Mapper / Reducer I will inherit from). >> >>>> >> >>>> I don't understand the 'reuse' option you are referring to. >> >>>> How many map tasks will be created? One per split or one per VM >> (node)? >> >>>> Are you suggesting that although there would be one Mapper in the >> node, >> >>>> each new operator (or reflecting) will create a new instance? >> >>>> Thus making lots of that instance? >> >>>> >> >>>> BTW, >> >>>> these helper class I want to create are of course not going to be >> >>>> stateful. They are defiantly 'helper' class that have some logic. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Eyal >> >>>> >> >>>> Eyal Golan >> >>>> egola...@gmail.com >> >>>> >> >>>> Visit: http://jvdrums.sourceforge.net/ >> >>>> LinkedIn: http://www.linkedin.com/in/egolan74 >> >>>> Skype: egolan74 >> >>>> >> >>>> P Save a tree. Please don't print this e-mail unless it's really >> >>>> necessary >> >>>> >> >>>> >> >>>> >> >>>> On Sat, Dec 31, 2011 at 6:50 AM, Anirudh <techie.anir...@gmail.com> >> >>>> wrote: >> >>>>> >> >>>>> Where are you creating this new class. If it is in the map function, >> >>>>> then it will be create a new object for each record in the split. >> >>>>> >> >>>>> Also you may need to see how the JVM reuse option works. I am not >> too >> >>>>> sure of this and you may want to look at the code. If the option >> for JVM >> >>>>> reuse is set, then my understanding is for every task, a new Map >> task would >> >>>>> be created and in that case the "new" operator will create another >> instance >> >>>>> even if this statement is not in the map function. >> >>>>> >> >>>>> >> >>>>> On Fri, Dec 30, 2011 at 6:22 AM, Eyal Golan <egola...@gmail.com> >> wrote: >> >>>>>> >> >>>>>> Great News !! >> >>>>>> Thanks for the info. >> >>>>>> >> >>>>>> So using reflection, I can inject different implementations of >> >>>>>> interfaces (services) for the mapper (or reducer). >> >>>>>> And this way I can test a mapper (or reducer). >> >>>>>> Just by reflecting a stub instead of a real implementation. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Eyal Golan >> >>>>>> egola...@gmail.com >> >>>>>> >> >>>>>> Visit: http://jvdrums.sourceforge.net/ >> >>>>>> LinkedIn: http://www.linkedin.com/in/egolan74 >> >>>>>> Skype: egolan74 >> >>>>>> >> >>>>>> P Save a tree. Please don't print this e-mail unless it's really >> >>>>>> necessary >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Dec 30, 2011 at 2:50 PM, Harsh J <ha...@cloudera.com> >> wrote: >> >>>>>>> >> >>>>>>> Eyal, >> >>>>>>> >> >>>>>>> Yes, it is right to think of each Task attempt being one >> individual >> >>>>>>> JVM running individually on any added Node. Multiple slots would >> mean >> >>>>>>> multiple VMs in parallel as well. Yes, your use of reflection to >> build your >> >>>>>>> objects will work just fine -- its all user-side java code that >> is executed. >> >>>>>>> >> >>>>>>> On 30-Dec-2011, at 4:42 PM, Eyal Golan wrote: >> >>>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> I want to understand a basic concept in MR. >> >>>>>>> >> >>>>>>> If a mapper creates an instance of some class (using the 'new' >> >>>>>>> operator), then the created class exists ONCE in the VM of this >> node. >> >>>>>>> For each node. >> >>>>>>> Correct? >> >>>>>>> >> >>>>>>> Now, >> >>>>>>> what if instead of using the 'new' operator, the class is created >> >>>>>>> using reflection. >> >>>>>>> Is it valid in a MR? >> >>>>>>> Will only one instance of the created class be existing in that >> node? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> >> >>>>>>> Eyal >> >>>>>>> >> >>>>>>> Eyal Golan >> >>>>>>> egola...@gmail.com >> >>>>>>> >> >>>>>>> Visit: http://jvdrums.sourceforge.net/ >> >>>>>>> LinkedIn: http://www.linkedin.com/in/egolan74 >> >>>>>>> Skype: egolan74 >> >>>>>>> >> >>>>>>> P Save a tree. Please don't print this e-mail unless it's really >> >>>>>>> necessary >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> >> >> >> -- >> Harsh J >> > >