[
https://issues.apache.org/jira/browse/HDFS-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951715#comment-15951715
]
Xiaobing Zhou edited comment on HDFS-11608 at 3/31/17 11:17 PM:
----------------------------------------------------------------
After some debugging, it turns out it's related to Integer overflow.
adjustChunkBoundary casts long to int in Math.min, resulting in one overflow
(i.e. psize == -2147483648). Moreover, with the changes of
computePacketChunkSize in HDFS-7308, (psize - PacketHeader.PKT_MAX_HEADER_LEN)
leads to another overflow (i.e. bodySize is 2147483615 as a result of
(-2147483648 - 33)), so chunksPerPacket == 4161789, packetSize == 516 * 4161789
== 2147483124, finally causing out-of-mem and invalid payload issues.
Note that without HDFS-7308, Math.max(psize/chunkSize, 1) won't have another
overflow, it gives out 1 which is good.
the code in HDFS-7308
{code}
private void computePacketChunkSize(int psize, int csize) {
+ final int bodySize = psize - PacketHeader.PKT_MAX_HEADER_LEN;
final int chunkSize = csize + getChecksumSize();
- chunksPerPacket = Math.max(psize/chunkSize, 1);
+ chunksPerPacket = Math.max(bodySize/chunkSize, 1);
{code}
DFSOutputStream#adjustChunkBoundary
{code}
if (!getStreamer().getAppendChunk()) {
int psize = Math.min((int)(blockSize- getStreamer().getBytesCurBlock()),
dfsClient.getConf().getWritePacketSize());
computePacketChunkSize(psize, bytesPerChecksum);
}
{code}
was (Author: xiaobingo):
After some debugging, it turns out it's related to Integer overflow.
adjustChunkBoundary casts long to int in Math.min, resulting in one overflow
(i.e. psize == -2147483648). Moreover, with the changes of
computePacketChunkSize in HDFS-7308, (psize - PacketHeader.PKT_MAX_HEADER_LEN)
leads to another overflow (i.e. bodySize is 2147483615 as a result of
(2147483648 - 33)), so chunksPerPacket == 4161789, packetSize == 516 * 4161789
== 2147483124, finally causing out-of-mem and invalid payload issues.
Note that without HDFS-7308, Math.max(psize/chunkSize, 1) won't have another
overflow, it gives out 1 which is good.
the code in HDFS-7308
{code}
private void computePacketChunkSize(int psize, int csize) {
+ final int bodySize = psize - PacketHeader.PKT_MAX_HEADER_LEN;
final int chunkSize = csize + getChecksumSize();
- chunksPerPacket = Math.max(psize/chunkSize, 1);
+ chunksPerPacket = Math.max(bodySize/chunkSize, 1);
{code}
DFSOutputStream#adjustChunkBoundary
{code}
if (!getStreamer().getAppendChunk()) {
int psize = Math.min((int)(blockSize- getStreamer().getBytesCurBlock()),
dfsClient.getConf().getWritePacketSize());
computePacketChunkSize(psize, bytesPerChecksum);
}
{code}
> HDFS write crashed in the case of huge block size
> -------------------------------------------------
>
> Key: HDFS-11608
> URL: https://issues.apache.org/jira/browse/HDFS-11608
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.8.0
> Reporter: Xiaobing Zhou
> Assignee: Xiaobing Zhou
> Priority: Critical
> Attachments: HDFS-11608.000.patch
>
>
> We've seen HDFS write crashes in the case of huge block size. For example,
> writing a 3G file using 3G block size, HDFS client throws out of memory
> exception. DataNode gives out IOException. After changing heap size limit,
> DFSOutputStream ResponseProcessor exception is seen followed by Broken pipe
> and pipeline recovery.
> Give below:
> DN exception,
> {noformat}
> 2017-03-30 16:34:33,828 ERROR datanode.DataNode (DataXceiver.java:run(278)) -
> c6401.ambari.apache.org:50010:DataXceiver error processing WRITE_BLOCK
> operation src: /192.168.64.101:47167 dst: /192.168.64.101:50010
> java.io.IOException: Incorrect value for packet payload size: 2147483128
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:502)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:898)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:806)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]