We have a potential parallel application in which a task consists of say 5 subcomponents, each of which can run concurrently. Each component takes on the order of 3 seconds to complete. We would like to get performance enhancement from parallel processing of a single task (task runs 5 parallel subcomponents), as well as multiple tasks. Command line job submission (we're using pipes) and result communication seem to be bottlenecks to making this process as efficient it can be?
E.g. startup time for submitting a job via the command line is on the order of 1.6s from cygwin Is hadoop not particularly well suited to this type of application? What are efficient ways to submit jobs and communicate results (in particular, from C++). Are there any extensions that sit on top of hadoop to facilitate this? I expected something like sockets, SOAP, REST, etc.? Thanks, Marc
