[
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189037#comment-14189037
]
Tsz Wo Nicholas Sze commented on HDFS-7276:
-------------------------------------------
> ... However, it is unfortunate that our full package size is 64k + hearder
> length, which will round up to 128k.
I was wrong about the full package size. In
DFSOutputStream.computePacketChunkSize(..),
{code}
private void computePacketChunkSize(int psize, int csize) {
final int chunkSize = csize + getChecksumSize();
chunksPerPacket = Math.max(psize/chunkSize, 1);
packetSize = chunkSize*chunksPerPacket;
if (DFSClient.LOG.isDebugEnabled()) {
...
}
}
{code}
So we have the following
|| variables || usual values ||
| psize | dfsClient.getConf().writePacketSize = 64kB |
| csize | bytesPerChecksum = 512B |
| getChecksumSize(), i.e. CRC size | 32B |
| chunkSize = csize + getChecksumSize() | 544B (not a power of two) |
| psize/chunkSize | 120.47 |
| chunksPerPacket = max(psize/chunkSize, 1) | 120 |
| packetSize = chunkSize*chunksPerPacket (not including header) | 65280 |
| PacketHeader.PKT_MAX_HEADER_LEN | 33B |
| actual packet size | 65280 + 33 = *65313* < 65536 = 64k |
It is fortunate that the usual packetSize = 65313 < 64k although the
calculation above does not guarantee it happen (e.g. if PKT_MAX_HEADER_LEN=257,
then actual packet size=65537 > 64k.) I will fix the computation in order to
guarantee actual packet size < 64k.
> Limit the number of byte arrays used by DFSOutputStream
> -------------------------------------------------------
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch,
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch,
> h7276_20141027b.patch, h7276_20141028.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of
> outstanding packets could be large. The byte arrays created by those packets
> could occupy a lot of memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)