Re: Streaming Hadoop using C

Mark question Thu, 01 Mar 2012 15:07:29 -0800

Starfish worked great for wordcount .. I didn't run it on my application
because I have only map tasks.


Mark

On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl <charles.ce...@gmail.com>wrote:

> How was your experience of starfish?
> C
> On Mar 1, 2012, at 12:35 AM, Mark question wrote:
>
> > Thank you for your time and suggestions, I've already tried starfish, but
> > not jmap. I'll check it out.
> > Thanks again,
> > Mark
> >
> > On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl <charles.ce...@gmail.com
> >wrote:
> >
> >> I assume you have also just tried running locally and using the jdk
> >> performance tools (e.g. jmap) to gain insight by configuring hadoop to
> run
> >> absolute minimum number of tasks?
> >> Perhaps the discussion
> >>
> >>
> http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
> >> might be relevant?
> >> On Feb 29, 2012, at 3:53 PM, Mark question wrote:
> >>
> >>> I've used hadoop profiling (.prof) to show the stack trace but it was
> >> hard
> >>> to follow. jConsole locally since I couldn't find a way to set a port
> >>> number to child processes when running them remotely. Linux commands
> >>> (top,/proc), showed me that the virtual memory is almost twice as my
> >>> physical which means swapping is happening which is what I'm trying to
> >>> avoid.
> >>>
> >>> So basically, is there a way to assign a port to child processes to
> >> monitor
> >>> them remotely (asked before by Xun) or would you recommend another
> >>> monitoring tool?
> >>>
> >>> Thank you,
> >>> Mark
> >>>
> >>>
> >>> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl <
> charles.ce...@gmail.com
> >>> wrote:
> >>>
> >>>> Mark,
> >>>> So if I understand, it is more the memory management that you are
> >>>> interested in, rather than a need to run an existing C or C++
> >> application
> >>>> in MapReduce platform?
> >>>> Have you done profiling of the application?
> >>>> C
> >>>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
> >>>>
> >>>>> Thanks Charles .. I'm running Hadoop for research to perform
> duplicate
> >>>>> detection methods. To go deeper, I need to understand what's slowing
> my
> >>>>> program, which usually starts with analyzing memory to predict best
> >> input
> >>>>> size for map task. So you're saying piping can help me control memory
> >>>> even
> >>>>> though it's running on VM eventually?
> >>>>>
> >>>>> Thanks,
> >>>>> Mark
> >>>>>
> >>>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <
> >> charles.ce...@gmail.com
> >>>>> wrote:
> >>>>>
> >>>>>> Mark,
> >>>>>> Both streaming and pipes allow this, perhaps more so pipes at the
> >> level
> >>>> of
> >>>>>> the mapreduce task. Can you provide more details on the application?
> >>>>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
> >>>>>>
> >>>>>>> Hi guys, thought I should ask this before I use it ... will using C
> >>>> over
> >>>>>>> Hadoop give me the usual C memory management? For example,
> malloc() ,
> >>>>>>> sizeof() ? My guess is no since this all will eventually be turned
> >> into
> >>>>>>> bytecode, but I need more control on memory which obviously is hard
> >> for
> >>>>>> me
> >>>>>>> to do with Java.
> >>>>>>>
> >>>>>>> Let me know of any advantages you know about streaming in C over
> >>>> hadoop.
> >>>>>>> Thank you,
> >>>>>>> Mark
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Streaming Hadoop using C

Reply via email to