[
https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mingleizhang updated MAPREDUCE-6729:
------------------------------------
Description:
When doing DFSIO test as distributed i/o benchmark tool. Then especially writes
plenty of files to disk or read from, both can cause performance issue and
imprecise value in a way. The question is that existing practices needs to
delete files when before running a job and that will cause extra time
consumption and furthermore cause performance issue, statistical time error and
imprecise throughput for us. We need to replace or improve this hack to prevent
this from happening in the future.
{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
}
private void writeTest(FileSystem fs) throws IOException {
Path writeDir = getWriteDir(config);
fs.delete(getDataDir(config), true);
fs.delete(writeDir, true);
runIOTest(WriteMapper.class, writeDir);
}
{code}
[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]
was:
When doing DFSIO test as distributed i/o benchmark tool. Then especially writes
plenty of files to disk or read from, both can cause performance issue and
imprecise value in a way. The question is that existing practices needs to
delete files when before running a job and that will cause time consumption and
furthermore cause performance issue, statistical time error and imprecise
throughput for us. We need to replace or improve this hack to prevent this from
happening in the future.
{code}
public static void testWrite() throws Exception {
FileSystem fs = cluster.getFileSystem();
long tStart = System.currentTimeMillis();
bench.writeTest(fs);
long execTime = System.currentTimeMillis() - tStart;
bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
}
private void writeTest(FileSystem fs) throws IOException {
Path writeDir = getWriteDir(config);
fs.delete(getDataDir(config), true);
fs.delete(writeDir, true);
runIOTest(WriteMapper.class, writeDir);
}
{code}
[https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]
> Hitting performance and error when lots of files to write or read
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-6729
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: benchmarks, performance, test
> Reporter: mingleizhang
> Priority: Minor
> Labels: performance, test
>
> When doing DFSIO test as distributed i/o benchmark tool. Then especially
> writes plenty of files to disk or read from, both can cause performance issue
> and imprecise value in a way. The question is that existing practices needs
> to delete files when before running a job and that will cause extra time
> consumption and furthermore cause performance issue, statistical time error
> and imprecise throughput for us. We need to replace or improve this hack to
> prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
> FileSystem fs = cluster.getFileSystem();
> long tStart = System.currentTimeMillis();
> bench.writeTest(fs);
> long execTime = System.currentTimeMillis() - tStart;
> bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
> }
> private void writeTest(FileSystem fs) throws IOException {
> Path writeDir = getWriteDir(config);
> fs.delete(getDataDir(config), true);
> fs.delete(writeDir, true);
> runIOTest(WriteMapper.class, writeDir);
> }
> {code}
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]