[
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ming Chen updated MAPREDUCE-5605:
---------------------------------
Attachment: (was: TextOutputFormat.java)
> Memory-centric MapReduce aiming to solve the I/O bottleneck
> -----------------------------------------------------------
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
> Reporter: Ming Chen
> Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java,
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java,
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java,
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java,
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java,
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java,
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java,
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java,
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java,
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java,
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java,
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O
> devices. So the idea is to maximize the usage of memory to solve the problem
> of I/O bottleneck. We developed a multi-threaded task execution engine, which
> runs in a single JVM on a node. In the execution engine, we have implemented
> the algorithm of memory scheduling to realize global memory management, based
> on which we further developed the techniques such as sequential disk
> accessing, multi-cache and solved the problem of full garbage collection in
> the JVM. We have conducted extensive experiments with comparison against the
> native Hadoop platform. The results show that the Mammoth system can reduce
> the job execution time by more than 40% in typical cases, without requiring
> any modifications of the Hadoop programs. When a system is short of memory,
> Mammoth can improve the performance by up to 4 times, as observed for I/O
> intensive applications, such as PageRank.
--
This message was sent by Atlassian JIRA
(v6.1#6144)