[
https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896935#action_12896935
]
Arun C Murthy commented on HDFS-1338:
-------------------------------------
Some possible enhancements we could aim for, under certain assumptions, are:
# Assume that the cluster is empty and TestDFSIO is the *only* application
running
# Get all the nodes in the cluster
# Implement a custom input-split to write only 1 replica to a subset of the
nodes in the cluster
# Read the replicas in a manner that ensures equal node-local, rack-local and
off-switch replicas.
Thoughts?
> Improve TestDFSIO
> -----------------
>
> Key: HDFS-1338
> URL: https://issues.apache.org/jira/browse/HDFS-1338
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Arun C Murthy
>
> Currently the read test in TestDFSIO benchmark just opens a large side file
> and measures the read performance. The MR scheduler has no opportunity to do
> *any* optimization for the TestDFSIO MR application. The side-effect of this
> is that it is *very* hard to do any meaningful analysis of the results of the
> benchmark i.e. to check if node-local or rack-local or off-switch read
> performance improved/degraded.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.