We have a potential parallel application in which a task consists of say
5 subcomponents, each of which can run concurrently.  Each component
takes on the order of 3 seconds to complete.  We would like to get
performance enhancement from parallel processing of a single task (task
runs 5 parallel subcomponents), as well as multiple tasks.  Command line
job submission (we're using pipes) and result communication seem to be
bottlenecks to making this process as efficient it can be?  

E.g. startup time for submitting a job via the command line is on the 
order of 1.6s from cygwin

Is hadoop not particularly well suited to this type of application?
What are efficient ways to submit jobs and communicate results (in
particular, from C++).  Are there any extensions that sit on top of
hadoop to facilitate this?  I expected something like sockets, SOAP,
REST, etc.?

Thanks,
Marc

Reply via email to