Need to instrument Hadoop to get comprehensive network traffic metrics
----------------------------------------------------------------------
Key: HADOOP-2830
URL: https://issues.apache.org/jira/browse/HADOOP-2830
Project: Hadoop Core
Issue Type: Improvement
Reporter: Runping Qi
One of most often asked question regarding Hadoop performance is: was the job
cpu bounded, or disk bounded, or network bounded.
The first two parts can be answered based on metric data of individual
machines, thus are relatively easy to answer.
The third part is much harder, especially for a large cluster. To unswer the
question, we need to know the followings:
1. The network traffic to and from the nodes in the cluster
2. The network traffic going between node pairs through the switch they share
3. The network traffic going through the back links between the switches
With these data, we can get a better insight on the relationship between
network bandwidth and hadoop performance.
We need to instrument the Hadoop code to obtain the above data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.