[
https://issues.apache.org/jira/browse/HDFS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990244#comment-14990244
]
Rakesh R commented on HDFS-8968:
--------------------------------
Good work [~lirui]. I've few comments, please take a look at it.
# Can we make this configurable like
{{System.getProperty("test.benchmark.data","/tmp/benchmark/data"));}}
{code}
private static final String DFS_TMP_DIR = "/tmp/benchmark";
{code}
# {{printUsage}} can be highlighted using {{System.err.println}}. Also, we can
say {{"Usage: ErasureCodeBenchmarkThroughput}}
{code}
System.out.println("ErasureCodeBenchmarkThroughput <read|write|gen|clean> "
+ "<size in MB> <ec|rep> [num clients] [stf|pos]\n" +
"Stateful and positional option is only available for read.");
{code}
# It would be good to use hadoop utility {{StopWatch}} for the elapsed time
computations. Presently its using {{System.currentTimeMillis() - start) /
1000.0}}.
Sample usage:
{code}
org.apache.hadoop.util.StopWatch sw = new StopWatch().start();
// do the operation
sw.stop();
long elapsedtime = sw.now(TimeUnit.SECONDS);
{code}
# Just a suggestion to use {{java.util.concurrent.ExecutorCompletionService}}
here rather than trying to find out which task has completed.
{code}
+ for (Future<Long> future : futures) {
+ results.add(future.get());
+ }
{code}
bq. As to unit test, maybe I can add a test where the tool runs against a
MiniDFSCluster.
How about running both a real cluster and a MiniDFSCluster inside the
ErasureCodeBenchmarkThroughput tool, similar to the
{{org.apache.hadoop.hdfs.BenchmarkThroughput}}?
> New benchmark throughput tool for striping erasure coding
> ---------------------------------------------------------
>
> Key: HDFS-8968
> URL: https://issues.apache.org/jira/browse/HDFS-8968
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Rui Li
> Attachments: HDFS-8968-HDFS-7285.1.patch,
> HDFS-8968-HDFS-7285.2.patch, HDFS-8968.3.patch
>
>
> We need a new benchmark tool to measure the throughput of client writing and
> reading considering cases or factors:
> * 3-replica or striping;
> * write or read, stateful read or positional read;
> * which erasure coder;
> * striping cell size;
> * concurrent readers/writers using processes or threads.
> The tool should be easy to use and better to avoid unnecessary local
> environment impact, like local disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)