[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
---------------------------------

    Description: Memory is a very important resource to bridge the gap between 
CPUs and I/O devices. So the idea is to maximize the usage of memory to solve 
the problem of I/O bottleneck. We developed a multi-threaded task execution 
engine, which runs in a single JVM on a node. In the execution engine, we have 
implemented the algorithm of memory scheduling to realize global memory 
management, based on which we further developed the techniques such as 
sequential disk accessing, multi-cache and solved the problem of full garbage 
collection in the JVM. The benchmark results shows that it can get impressive 
improvement in typical cases. When the a system is relatively short of memory 
(eg, HPC, small- and medium-size enterprises), the improvement will be even 
more impressive.  (was: Memory is a very important resource to bridge the gap 
between CPUs and I/O devices. So the idea is to maximize the usage of memory to 
solve the problem of I/O bottleneck. We developed a multi-threaded task 
execution engine, which runs in a single JVM on a node. In the execution 
engine, we have implemented the algorithm of memory scheduling to realize 
global memory management, based on which we further developed the techniques 
such as sequential disk accessing, multi-cache and solved the problem of full 
garbage collection in the JVM. We have conducted extensive experiments with 
comparison against the native Hadoop platform. The results show that the 
Mammoth system can reduce the job execution time by more than 40% in typical 
cases, without requiring any modifications of the Hadoop programs. When a 
system is short of memory, Mammoth can improve the performance by up to 4 
times, as observed for I/O intensive applications, such as PageRank. )

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5605
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 1.0.1
>         Environment: x86-64 Linux/Unix
> jdk7 preferred
>            Reporter: Ming Chen
>            Assignee: Ming Chen
>         Attachments: MAPREDUCE-5605-v1.patch
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to