[
https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897070#action_12897070
]
Konstantin Shvachko commented on HDFS-1338:
-------------------------------------------
DFSIO benchmark is designed to measure HDFS data transfer performance only.
TestDFSIO is not intended to benchmark typical MR usage pattern.
TestDFSIO intentionally avoids any overhead or optimizations induced by MR
framework.
The MR scheduler should not be able to do any optimization for TestDFSIO.
It's a simple and straightforward benchmark, I'd prefer to keep it that way.
It seems you are talking about a different benchmark, which will allow to
measure the MR framework optimizations. This makes sense, and it is very sad
that we still don't have any benchmarks dedicated to this area, if I don't miss
anything. I think DFSIO framework can be used for this new benchmark.
What are the main objectives for the new benchmark? As Arun proposed, it should
be able to distinguish between node-local, rack-local and off-switch data
transfers. Anything else?
In my view Hong's bullet points are well formulated practices of running DFSIO
on a cluster to make the results meaningful.
I'd add one thing: turn of logging.
TestDFSIO is a part of mapreduce now. So this jira should rather be filed
there. We can keep the discussion here, and create a MR jira later to commit
the code once the a patch is ready.
> Improve TestDFSIO
> -----------------
>
> Key: HDFS-1338
> URL: https://issues.apache.org/jira/browse/HDFS-1338
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Arun C Murthy
>
> Currently the read test in TestDFSIO benchmark just opens a large side file
> and measures the read performance. The MR scheduler has no opportunity to do
> *any* optimization for the TestDFSIO MR application. The side-effect of this
> is that it is *very* hard to do any meaningful analysis of the results of the
> benchmark i.e. to check if node-local or rack-local or off-switch read
> performance improved/degraded.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.