[
https://issues.apache.org/jira/browse/HADOOP-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627633#action_12627633
]
Alejandro Abdelnur commented on HADOOP-3748:
--------------------------------------------
We've run a test in a 400 nodes cluster and the network pkg_in/min increase in
the JT is 66%.
Consolidating numbers:
|| Cluster size || pkg_in Idle || pkg_in wo/counters || pkg_in w/counters ||
pkg_in % increase || pkg_in % increase excluding idle load ||
| 100 | 0654 | 1080 | 1380 | 28% | 70% |
| 200 | 0397 | 0591 | 0945 | 60% | 182% |
| 400 | 1660 | 1840 | 3070 | 67% | 783% |
* pkg_in are per minute.
* network load in the 100 node cluster was consistently higher even in idle
mode.
As I've mentioned before our tests were focusing on finding the overhead of
using 200 counters not on overall performance impact. The test jobs where not
creating any input/output data network traffic.
> Flag to make tasks to send counter information only at the end of the task
> --------------------------------------------------------------------------
>
> Key: HADOOP-3748
> URL: https://issues.apache.org/jira/browse/HADOOP-3748
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Environment: all
> Reporter: Alejandro Abdelnur
> Attachments: [HADOOP-3748].patch
>
>
> Currently counters are streaming from the task to the jobtracker as the task
> progresses. If the number of counters is large this has a significant impact
> on the network traffic as well as in the JobTracker load.
> The should be a flag, for example by counter-group, that indicates that the
> counters are to be reported at the end of the task. By default this flag
> should be set to false for all counter-groups maintaining the current
> behavior.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.