[
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365846#comment-15365846
]
ASF GitHub Bot commented on MAPREDUCE-6729:
-------------------------------------------
GitHub user zhangminglei opened a pull request:
https://github.com/apache/hadoop/pull/111
MAPREDUCE-6729. Hitting performance and error when lots of files to w…
…rite or read.
When doing DFSIO test as a distributed i/o benchmark tool. Then especially
writes plenty of files to disk or read from, both can cause performance issue
and imprecise value in a way. The question is that existing practices needs to
delete files when before running a job and that will cause extra time
consumption and furthermore cause performance issue, statistical time error and
imprecise throughput while the files are lots of. So we need to replace or
improve this hack to prevent this from happening in the future.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhangminglei/hadoop trunk
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hadoop/pull/111.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #111
----
commit 380cea926aa7219c4e8444692957f6d1b04c9849
Author: zhangminglei <[email protected]>
Date: 2016-07-07T09:00:16Z
MAPREDUCE-6729. Hitting performance and error when lots of files to write
or read.
When doing DFSIO test as a distributed i/o benchmark tool. Then especially
writes plenty of files to disk or read from, both can cause performance issue
and imprecise value in a way. The question is that existing practices needs to
delete files when before running a job and that will cause extra time
consumption and furthermore cause performance issue, statistical time error and
imprecise throughput while the files are lots of. So we need to replace or
improve this hack to prevent this from happening in the future.
----
> Hitting performance and error when lots of files to write or read
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: benchmarks, performance, test
> Reporter: mingleizhang
> Priority: Minor
> Labels: performance, test
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially
> writes plenty of files to disk or read from, both can cause performance issue
> and imprecise value in a way. The question is that existing practices needs
> to delete files when before running a job and that will cause extra time
> consumption and furthermore cause performance issue, statistical time error
> and imprecise throughput while the files are lots of. So we need to replace
> or improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs); // this line of code will cause extra time
> consumption because of fs.delete(*,*) by the writeTest method
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
> }
> private void writeTest(FileSystem fs) throws IOException {
> Path writeDir = getWriteDir(config);
> fs.delete(getDataDir(config), true);
> fs.delete(writeDir, true);
> runIOTest(WriteMapper.class, writeDir);
> }
> {code}
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]