[ 
https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-3062:
----------------------------------

    Attachment: 3062-0.patch

First draft.

Format:
{noformat}
<log4j schema including timestamp, etc.> src: <src IP>, dest: <dst IP>, bytes: 
<bytes>, op: <op enum>, id: <DFSClient id|taskid>[, blockid: <block id>] 
{noformat}

The patch adds the DFSClient clientName to OP_READ_BLOCK and changes the String 
in OP_WRITE_BLOCK from the path- which is unused- to the clientName. Is this is 
set to DFSClient_<taskid> in map and reduce tasks, tracing the output of a job 
should be straightforward after some processing of each entry. Writes for 
replications (where the clientName is "") are logged as they have been; the 
logging in PacketResponder has been reformatted to fit the preceding schema. A 
few known issues:

* The logging assumes the IP address is sufficient to distinguish a source, 
particularly for writes and in the shuffle
* This logs to the DataNode and ReduceTask appenders; these entries should be 
directed elsewhere and disabled by default
* In testing this, some entries in the read exhibited a strange property: the 
source and destination match, but neither matches the DataNode on which it is 
logged. I'm clearly missing something.

I tried tracing a few blocks and map outputs through the logs and all made 
sense. That said- as mentioned in the last bullet- not all of the entries made 
sense.

> Need to capture the metrics for the network ios generate by dfs reads/writes 
> and map/reduce shuffling  and break them down by racks 
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3062
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3062
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Runping Qi
>         Attachments: 3062-0.patch
>
>
> In order to better understand the relationship between hadoop performance and 
> the network bandwidth, we need to know 
> what the aggregated traffic data in a cluster and its breakdown by racks. 
> With these data, we can determine whether the network 
> bandwidth is the bottleneck when certain jobs are running on a cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to