[ https://issues.apache.org/jira/browse/HDFS-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619467#comment-14619467 ]
Jing Zhao commented on HDFS-8734: --------------------------------- The analysis makes sense to me. But looks like we cannot fix the issue in this way since the currentPacket variable is shared by all the streamers. BTW, we may need to have streamer[] and packet[] for DFSStripedOutputStream instead of using the same variable and keeping refreshing their values. But that also requires a lot of code refactoring in DFSOutputStream. > Erasure Coding: fix one cell need two packets > --------------------------------------------- > > Key: HDFS-8734 > URL: https://issues.apache.org/jira/browse/HDFS-8734 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Walter Su > Assignee: Walter Su > Attachments: HDFS-8734.01.patch > > > The default WritePacketSize is 64k > Currently default cellSize is 64k > We hope one cell consumes one packet. In fact it's not. > By default, > chunkSize = 516( 512 data + 4 checksum) > packetSize = 64k > chunksPerPacket = 126 ( See DFSOutputStream#computePacketChunkSize for > details) > numBytes of data in one packet = 64512 > cellSize = 65536 > When first packet is full ( with 64512 data), there are still 65536 - 64512 = > 1024 bytes left. > {code} > super.writeChunk(bytes, offset, len, checksum, ckoff, cklen); > // cell is full and current packet has not been enqueued, > if (cellFull && currentPacket != null) { > enqueueCurrentPacketFull(); > } > {code} > When the last 1024 bytes of the cell was written, we meet {{cellFull}} and > create another packet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)