[
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ming Chen updated MAPREDUCE-5605:
---------------------------------
Attachment: TextOutputFormat.java
TaskTrackerStatus.java
TaskTrackerInstrumentation.java
TaskTrackerAction.java
TaskTracker.java
TaskStatus.java
TaskScheduler.java
TaskRunner.java
TaskReport.java
TaskMemoryManagerThread.java
TaskLogsTruncater.java
TaskLogServlet.java
TaskLogAppender.java
TaskLog.java
TaskInProgress.java
Task.java
SpillScheduler.java
SequenceFileOutputFormat.java
RunningJob.java
RoundQueue.java
ReinitTrackerAction.java
ReduceTaskStatus.java
ReduceTaskRunner.java
ReduceTask.java
ReduceRamManager.java
RecordReader.java
RawKeyValueIterator.java
RawHistoryFileServlet.java
RawBufferedOutputStream.java
RamManager.java
Partitioner.java
OutputLogFilter.java
OutputFormat.java
OutputCommitter.java
OutputCollector.java
Operation.java
MergeSorter.java
Merger.java
MemoryElement.java
MapTaskStatus.java
MapTaskRunner.java
MapTaskCompletionEventsUpdate.java
MapTask.java
MapRunner.java
MapRamManager.java
MapOutputFile.java
JvmTask.java
JvmManager.java
JVMId.java
JobTaskRunner.java
JobConf.java
IFile.java
DefaultJvmMemoryManager.java
ChildRamManager.java
Child.java
CachePool.java
CacheOutputStream.java
CacheFile.java
> Memory-centric MapReduce aiming to solve the I/O bottleneck
> -----------------------------------------------------------
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
> Reporter: Ming Chen
> Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java,
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java,
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java,
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java,
> OutputCollector.java, OutputCommitter.java, OutputFormat.java,
> OutputLogFilter.java, Partitioner.java, RamManager.java,
> RawBufferedOutputStream.java, RawHistoryFileServlet.java,
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java,
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java,
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java,
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java,
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java,
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java,
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java,
> TaskTrackerAction.java, TaskTrackerInstrumentation.java,
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O
> devices. So the idea is to maximize the usage of memory to solve the problem
> of I/O bottleneck. We developed a multi-threaded task execution engine, which
> runs in a single JVM on a node. In the execution engine, we have implemented
> the algorithm of memory scheduling to realize global memory management, based
> on which we further developed the techniques such as sequential disk
> accessing, multi-cache and solved the problem of full garbage collection in
> the JVM. We have conducted extensive experiments with comparison against the
> native Hadoop platform. The results show that the Mammoth system can reduce
> the job execution time by more than 40% in typical cases, without requiring
> any modifications of the Hadoop programs. When a system is short of memory,
> Mammoth can improve the performance by up to 4 times, as observed for I/O
> intensive applications, such as PageRank.
--
This message was sent by Atlassian JIRA
(v6.1#6144)