Starfish worked great for wordcount .. I didn't run it on my application because I have only map tasks.
Mark On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl <charles.ce...@gmail.com>wrote: > How was your experience of starfish? > C > On Mar 1, 2012, at 12:35 AM, Mark question wrote: > > > Thank you for your time and suggestions, I've already tried starfish, but > > not jmap. I'll check it out. > > Thanks again, > > Mark > > > > On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl <charles.ce...@gmail.com > >wrote: > > > >> I assume you have also just tried running locally and using the jdk > >> performance tools (e.g. jmap) to gain insight by configuring hadoop to > run > >> absolute minimum number of tasks? > >> Perhaps the discussion > >> > >> > http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task > >> might be relevant? > >> On Feb 29, 2012, at 3:53 PM, Mark question wrote: > >> > >>> I've used hadoop profiling (.prof) to show the stack trace but it was > >> hard > >>> to follow. jConsole locally since I couldn't find a way to set a port > >>> number to child processes when running them remotely. Linux commands > >>> (top,/proc), showed me that the virtual memory is almost twice as my > >>> physical which means swapping is happening which is what I'm trying to > >>> avoid. > >>> > >>> So basically, is there a way to assign a port to child processes to > >> monitor > >>> them remotely (asked before by Xun) or would you recommend another > >>> monitoring tool? > >>> > >>> Thank you, > >>> Mark > >>> > >>> > >>> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl < > charles.ce...@gmail.com > >>> wrote: > >>> > >>>> Mark, > >>>> So if I understand, it is more the memory management that you are > >>>> interested in, rather than a need to run an existing C or C++ > >> application > >>>> in MapReduce platform? > >>>> Have you done profiling of the application? > >>>> C > >>>> On Feb 29, 2012, at 2:19 PM, Mark question wrote: > >>>> > >>>>> Thanks Charles .. I'm running Hadoop for research to perform > duplicate > >>>>> detection methods. To go deeper, I need to understand what's slowing > my > >>>>> program, which usually starts with analyzing memory to predict best > >> input > >>>>> size for map task. So you're saying piping can help me control memory > >>>> even > >>>>> though it's running on VM eventually? > >>>>> > >>>>> Thanks, > >>>>> Mark > >>>>> > >>>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl < > >> charles.ce...@gmail.com > >>>>> wrote: > >>>>> > >>>>>> Mark, > >>>>>> Both streaming and pipes allow this, perhaps more so pipes at the > >> level > >>>> of > >>>>>> the mapreduce task. Can you provide more details on the application? > >>>>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote: > >>>>>> > >>>>>>> Hi guys, thought I should ask this before I use it ... will using C > >>>> over > >>>>>>> Hadoop give me the usual C memory management? For example, > malloc() , > >>>>>>> sizeof() ? My guess is no since this all will eventually be turned > >> into > >>>>>>> bytecode, but I need more control on memory which obviously is hard > >> for > >>>>>> me > >>>>>>> to do with Java. > >>>>>>> > >>>>>>> Let me know of any advantages you know about streaming in C over > >>>> hadoop. > >>>>>>> Thank you, > >>>>>>> Mark > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >