[
https://issues.apache.org/jira/browse/HADOOP-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575553#action_12575553
]
rangadi edited comment on HADOOP-1702 at 3/5/08 8:25 PM:
--------------------------------------------------------------
Test results show *30%* improvement in DataNode CPU with the patch. I think it
makes sense. Based on the picture above before this patch, with replication of
3, the data is copied 6 + 6 + 4 times and with this patch it is 3 + 3 + 2. Each
of these datanodes verify CRC. Approximating cost of checksumming to be twice
that of a memory copy, we get (8+6)/(14+6) == 70%. If we increase the size of
checksum chunk, cost of CRC will go down. It is be 68% with a factor of 1.5 for
CRC.
Test Setup : three instances of 'dd if=/dev/zero 4Gb | hadoop -put - 4Gb'. More
importantly, DataNode was modified to write the deta to '/dev/null' instead of
the block file. Otherwise I could not isolate the test from disk activity. The
cluster has 3 datanodes. The clients, Namenode, and datanodes are all running
on the same node. The test was CPU bound.
CPU measurement : Linux reports a process' cpu in /proc/pid/stat : 14th entry
is user cpu and 15th is kernel cpu. I think these are specified in jiffies.
Like most things with Linux kernel, these are approximations but reasonably
dependable in large numbers. The numbers reported are sum of cpu for each of
the three datanodes.
below: 'u' and 'k' are user and kernel cpu in thousands of jiffies.
|| Test || Run 1 || Run 2 || Run 3 || Avg Total Cpu || Avg Time ||
| Trunk* | 8.60u 2.52k 372s | 8.36u 2.48k 368s | 8.39u 2.40k 368s | *10.95* |
369s |
| Trunk + patch* | 5.61u 2.22k 289s | 5.38u 2.16k 296s | 5.57u 2.25k 289s |
*7.73 (70%)*| 291s (79%) |
{{*}} : datanodes write data to /dev/null.
Currently, DFSIO benchmark shows dip in write b/w. I am still looking into it.
was (Author: rangadi):
Test results show *30%* improvement in DataNode CPU with the patch. I think it
makes sense. Based on the picture above before this patch, with replication of
3, the data is copied 6 + 6 + 4 times and with this patch it is 3 + 3 + 2. Each
of these datanodes verify CRC. Approximating cost of checksumming to be twice
that of a memory copy, we get (8+6)/(14+6) == 70%. If we increase the size of
checksum chunk, cost of CRC will go down. It will be 68% with a factor of 1.5
for CRC.
Test Setup : three instances of 'dd if=/dev/zero 4Gb | hadoop -put - 4Gb'. More
importantly, DataNode was modified to write the deta to '/dev/null' instead of
the block file. Otherwise I could not isolate the test from disk activity. The
cluster has 3 datanodes. The clients, Namenode, and datanodes are all running
on the same node. The test was CPU bound.
CPU measurement : Linux reports a process' cpu in /proc/pid/stat : 14th entry
is user cpu and 15th is kernel cpu. I think these are specified in jiffies.
Like most things with Linux kernel, these are approximations but reasonably
dependable in large numbers.
below: 'u' and 'k' are user and kernel cpu in thousands of jiffies.
|| Test || Run 1 || Run 2 || Run 3 || Avg Total Cpu || Avg Time ||
| Trunk* | 8.60u 2.52k 372s | 8.36u 2.48k 368s | 8.39u 2.40k 368s | *10.95* |
369s |
| Trunk + patch* | 5.61u 2.22k 289s | 5.38u 2.16k 296s | 5.57u 2.25k 289s |
*7.73 (70%)*| 291s (79%) |
{{*}} : datanodes write data to /dev/null.
Currently, DFSIO benchmark shows dip in write b/w. I am still looking into it.
> Reduce buffer copies when data is written to DFS
> ------------------------------------------------
>
> Key: HADOOP-1702
> URL: https://issues.apache.org/jira/browse/HADOOP-1702
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.14.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.17.0
>
> Attachments: HADOOP-1702.patch
>
>
> HADOOP-1649 adds extra buffering to improve write performance. The following
> diagram shows buffers as pointed by (numbers). Each eatra buffer adds an
> extra copy since most of our read()/write()s match the io.bytes.per.checksum,
> which is much smaller than buffer size.
> {noformat}
> (1) (2) (3) (5)
> +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to Mirror
>
> | (buffer) (socket) | (4)
> | +--||--+
> ===== |
> ===== =====
> (disk) =====
> {noformat}
> Currently loops that read and write block data, handle one checksum chunk at
> a time. By reading multiple chunks at a time, we can remove buffers (1), (2),
> (3), and (5).
> Similarly some copies can be reduced when clients read data from the DFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.