Ability to thread task execution
--------------------------------

                 Key: HADOOP-2990
                 URL: https://issues.apache.org/jira/browse/HADOOP-2990
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
         Environment: All
            Reporter: Holden Robbins


Currently Hadoop spawns a single threaded JVM for each task.  While good for 
many tasks, this does not maximize resource usage for slaves that have many 
cores (machines with more cores are getting more cost effective everyday) _and_ 
are running jobs that require many gigabytes of read-only in-memory resources 
to maximize throughput.  Running in separate JVMs requires redundantly loading 
large amounts of data, reducing the possible number of parallel tasks that can 
run per a machine even though more cpus are available.

Adding this ability will give hadoop users the flexibility to balance their 
need for maximizing memory usage & throughput and task segmentation.

Note: This is a blocking bug in porting processes over to hadoop for my own 
organization.  I am testing a patch for this now that leaves the existing 
behavior for single threaded operation in-tact.  All synchronization is done 
through wrapper classes and helper methods and should not add any overhead to 
non-threaded processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to