Ming Chen created MAPREDUCE-5605:
------------------------------------
Summary: Memory-centric MapReduce aiming to solve the I/O
bottleneck
Key: MAPREDUCE-5605
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 1.0.1
Environment: x86-64 Linux/Unix
jdk7 preferred
Reporter: Ming Chen
Assignee: Ming Chen
Memory is a very important resource to bridge the gap between
CPUs and I/O devices. So the idea is to maximize the usage of memory to solve
the problem of I/O bottleneck. We developed a multi-threaded task execution
engine, which runs in a single JVM on a node. In the execution engine, we have
implemented the algorithm of memory scheduling to realize global memory
management, based on which we further developed the techniques such as
sequential disk accessing, multi-cache and solved the problem of full garbage
collection in the JVM. We have conducted extensive experiments with comparison
against the native Hadoop platform. The results show that the Mammoth system
can reduce the job execution time by more than 40% in typical cases, without
requiring any modifications of the Hadoop programs. When a system is short of
memory, Mammoth can improve the performance by up to 4 times, as observed for
I/O intensive applications, such as PageRank.
--
This message was sent by Atlassian JIRA
(v6.1#6144)