[
https://issues.apache.org/jira/browse/HADOOP-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584962#action_12584962
]
Raghu Angadi commented on HADOOP-1702:
--------------------------------------
The dip in DFSIO benchmark turned out to be because of the fact that DFSIO
creates files with a buffersize of 1000000!. The buffersize passed while
creating file is passes on to FileSystem implementation (DFSClient in this
case). This brings up the question on how an implementation can treat user
specified buffersize. Can increasing buffersize (as in this case) reduce
performance, i.e. should an implementation allow it?
This is what happens on trunk:
- user specified buffesize is effectively does not matter on trunk.
- Client sends buffers up packets of 64k size and flushes them after the pkt if
full. There could at most 10 such packets in the pipeline at a time.. usually
much less.
- DataNodes use io.file.buffer.size for their streams.
With the patch here :
- user specified bufferesize sets the packet size.
- at DataNodes, packet size dictates write size for mirror stream and local
file (i.e. it io.file.buffer.size does not matter).
- The rest is same.
Another proposal :
- {{packetSize = Min( 64k, buffersize );}}
- {{Max # packets in pipeline = Max(buffersize/packetSize, 10)}}
'64k' here could be made an configurable (may be "dfs.write.packet.size") so
that different 'real' buffer sizes could be used for experimentation.
How does the above proposal sound?
> Reduce buffer copies when data is written to DFS
> ------------------------------------------------
>
> Key: HADOOP-1702
> URL: https://issues.apache.org/jira/browse/HADOOP-1702
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.14.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-1702.patch
>
>
> HADOOP-1649 adds extra buffering to improve write performance. The following
> diagram shows buffers as pointed by (numbers). Each eatra buffer adds an
> extra copy since most of our read()/write()s match the io.bytes.per.checksum,
> which is much smaller than buffer size.
> {noformat}
> (1) (2) (3) (5)
> +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to Mirror
>
> | (buffer) (socket) | (4)
> | +--||--+
> ===== |
> ===== =====
> (disk) =====
> {noformat}
> Currently loops that read and write block data, handle one checksum chunk at
> a time. By reading multiple chunks at a time, we can remove buffers (1), (2),
> (3), and (5).
> Similarly some copies can be reduced when clients read data from the DFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.