[
https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897137#action_12897137
]
Arun C Murthy commented on HDFS-1338:
-------------------------------------
{quote}
DFSIO benchmark is designed to measure HDFS data transfer performance only.
TestDFSIO is not intended to benchmark typical MR usage pattern.
TestDFSIO intentionally avoids any overhead or optimizations induced by MR
framework.
{quote}
A benchmark should be something we use to reason about a particular aspect of
the framework, in this case performance.
The point I'm trying to make is that TestDFSIO, as it stands, is formulated in
a way which is impossible to reason about its results. I don't particularly
care how we implement it and I agree it shouldn't be constrained by the
vagaries of the Map-Reduce scheduler. However, we do need a benchmark which
does node-local, rack-local, off-switch reads and writes in a predictable
manner so that when we notice a difference in the results of the benchmark we
are in position to reason about it.
> Improve TestDFSIO
> -----------------
>
> Key: HDFS-1338
> URL: https://issues.apache.org/jira/browse/HDFS-1338
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Arun C Murthy
>
> Currently the read test in TestDFSIO benchmark just opens a large side file
> and measures the read performance. The MR scheduler has no opportunity to do
> *any* optimization for the TestDFSIO MR application. The side-effect of this
> is that it is *very* hard to do any meaningful analysis of the results of the
> benchmark i.e. to check if node-local or rack-local or off-switch read
> performance improved/degraded.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.