reduce shuffling and break them down by racks

Chris Douglas (JIRA) Thu, 31 Jul 2008 15:59:23 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618896#action_12618896
 ]


Chris Douglas commented on HADOOP-3062:
---------------------------------------

The analysis should leverage HADOOP-3719, so this issue should cover the log4j 
appender emitting the HDFS and shuffling data. There are a few open questions 
and arguable assumptions:

* Should this count bytes successfully transferred separately from failed 
transfers? Should failed transfers be logged at all?
* The header/metadata/etc. traffic is assumed to be a negligible fraction of 
the total network traffic and irrelevant to the analysis for a particular job. 
The overall network utilization is also best measured using standard monitoring 
utilities that don't require any knowledge of Hadoop. This will focus on 
tracking block traffic over HDFS (reads, writes, replications) and map output 
fetched during the shuffle, only.
* For local reads, the source and destination IP will match. This should be 
sufficient to detect and discard during analysis of network traffic, but will 
not be sufficient to account for all reads from the local disk (counters and 
job history are likely better tools for this).
* Accounting for topology (to break down by racks, etc.) is best deferred to 
the analysis. Logging changes in topology would also be helpful, though I don't 
know whether Hadoop has sufficient information to do this in the general case.
* If job information is available (in the shuffle), should it be included in 
the entry? Doing this for HDFS is non-trivial, but would be invaluable to the 
analysis. I'm not certain how to do this, yet. Of course, replications and 
rebalancing won't include this, and HDFS reads prior to job submission (and all 
other traffic from JobClient) will likely be orphaned, as well.
* Should this include start/end entries so one can infer how long the transfer 
took?
* What about DistributedCache? Can it be ignored as part of the job setup, 
which is already omitted?

In general, the format will follow:
{noformat}
<log4j schema including timestamp, etc.> source: <src IP>, destination: <dst 
IP>, bytes: <bytes>, operation: <op enum>[, taskid: <TaskID>]
{noformat}

Where {{<(src|dst) IP>}} is the IP address of the source and destination nodes, 
{{<bytes>}} is a long, and {{<op enum>}} is one of {{HDFS_READ}}, 
{{HDFS_WRITE}}, {{HDFS_COPY}}, and {{MAPRED_SHUFFLE}}. {{HDFS_REPLACE}} should 
be redundant if {{HDFS_COPY}} is recorded (I think). The rebalancing traffic 
isn't relevant to job analysis, but if one is including sufficient information 
to determine the duration of each transfer it may be interesting. The TaskID 
should be sufficient, but one could argue that including the JobID would be 
useful as a point to join on.

Thoughts?

> Need to capture the metrics for the network ios generate by dfs reads/writes 
> and map/reduce shuffling  and break them down by racks 
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3062
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3062
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Runping Qi
>
> In order to better understand the relationship between hadoop performance and 
> the network bandwidth, we need to know 
> what the aggregated traffic data in a cluster and its breakdown by racks. 
> With these data, we can determine whether the network 
> bandwidth is the bottleneck when certain jobs are running on a cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3062) Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks

Reply via email to