[jira] [Comment Edited] (HDFS-11608) HDFS write crashed in the case of huge block size

Xiaobing Zhou (JIRA) Fri, 31 Mar 2017 16:19:02 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951715#comment-15951715
 ]


Xiaobing Zhou edited comment on HDFS-11608 at 3/31/17 11:17 PM:
----------------------------------------------------------------

After some debugging, it turns out it's related to Integer overflow.  
adjustChunkBoundary casts long to int in Math.min, resulting in one overflow 
(i.e. psize == -2147483648). Moreover, with the changes of 
computePacketChunkSize in HDFS-7308, (psize - PacketHeader.PKT_MAX_HEADER_LEN) 
leads to another overflow (i.e. bodySize is 2147483615 as a result of 
(-2147483648 - 33)), so chunksPerPacket == 4161789, packetSize == 516 * 4161789 
== 2147483124, finally causing out-of-mem and invalid payload issues.

Note that without HDFS-7308, Math.max(psize/chunkSize, 1) won't have another 
overflow, it gives out 1 which is good.

the code in HDFS-7308
{code}
   private void computePacketChunkSize(int psize, int csize) {
+    final int bodySize = psize - PacketHeader.PKT_MAX_HEADER_LEN;
     final int chunkSize = csize + getChecksumSize();
-    chunksPerPacket = Math.max(psize/chunkSize, 1);
+    chunksPerPacket = Math.max(bodySize/chunkSize, 1);
{code}

DFSOutputStream#adjustChunkBoundary
{code}
if (!getStreamer().getAppendChunk()) {
      int psize = Math.min((int)(blockSize- getStreamer().getBytesCurBlock()),
          dfsClient.getConf().getWritePacketSize());
      computePacketChunkSize(psize, bytesPerChecksum);
    }
{code}


was (Author: xiaobingo):
After some debugging, it turns out it's related to Integer overflow.  
adjustChunkBoundary casts long to int in Math.min, resulting in one overflow 
(i.e. psize == -2147483648). Moreover, with the changes of 
computePacketChunkSize in HDFS-7308, (psize - PacketHeader.PKT_MAX_HEADER_LEN) 
leads to another overflow (i.e. bodySize is 2147483615 as a result of 
(2147483648 - 33)), so chunksPerPacket == 4161789, packetSize == 516 * 4161789 
== 2147483124, finally causing out-of-mem and invalid payload issues.

Note that without HDFS-7308, Math.max(psize/chunkSize, 1) won't have another 
overflow, it gives out 1 which is good.

the code in HDFS-7308
{code}
   private void computePacketChunkSize(int psize, int csize) {
+    final int bodySize = psize - PacketHeader.PKT_MAX_HEADER_LEN;
     final int chunkSize = csize + getChecksumSize();
-    chunksPerPacket = Math.max(psize/chunkSize, 1);
+    chunksPerPacket = Math.max(bodySize/chunkSize, 1);
{code}

DFSOutputStream#adjustChunkBoundary
{code}
if (!getStreamer().getAppendChunk()) {
      int psize = Math.min((int)(blockSize- getStreamer().getBytesCurBlock()),
          dfsClient.getConf().getWritePacketSize());
      computePacketChunkSize(psize, bytesPerChecksum);
    }
{code}

> HDFS write crashed in the case of huge block size
> -------------------------------------------------
>
>                 Key: HDFS-11608
>                 URL: https://issues.apache.org/jira/browse/HDFS-11608
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.8.0
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>            Priority: Critical
>         Attachments: HDFS-11608.000.patch
>
>
> We've seen HDFS write crashes in the case of huge block size. For example, 
> writing a 3G file using 3G block size, HDFS client throws out of memory 
> exception. DataNode gives out IOException. After changing heap size limit,  
> DFSOutputStream ResponseProcessor exception is seen followed by Broken pipe 
> and pipeline recovery.
> Give below:
> DN exception,
> {noformat}
> 2017-03-30 16:34:33,828 ERROR datanode.DataNode (DataXceiver.java:run(278)) - 
> c6401.ambari.apache.org:50010:DataXceiver error processing WRITE_BLOCK 
> operation  src: /192.168.64.101:47167 dst: /192.168.64.101:50010
> java.io.IOException: Incorrect value for packet payload size: 2147483128
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:502)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:898)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:806)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-11608) HDFS write crashed in the case of huge block size

Reply via email to