[ 
https://issues.apache.org/jira/browse/HDFS-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618091#comment-14618091
 ] 

Walter Su commented on HDFS-8719:
---------------------------------

It happens when writing last stripe of full filled block.
{noformat}
computePacketChunkSize: src=/MoreThanABlockGroup3, chunkSize=516, 
chunksPerPacket=1, packetSize=516
{noformat}
parity cell is 64k, and packetSize=1 produces 515*3 unnecessary packets.


DFSStripedOutputStream writes parity blocks and starts writes next BlockGroup. 
remainingBytes==0 is a disaster for streamer writing parity cells because 
remainingBytes==0 causes packetSize=1.

{code}
+      if (remainingBytes > 0) {
{code}
The change above makes sense. One problem: Have you concidered 
remainingBytes==1? If it's already the last BlockGroup and happens writing last 
stripe of full filled block?
Assume blockSize=128mb, blockGroupSize =128mb*6=768mb. What if fileSize is 
(768mb - 1 byte) ? In this case there are still 515*3 unnecessary packets

nits: 1. Overwriting it in DFSStripedOutputStream make merging process easier. 
The logic only make sense in EC branch becasue it only make sense when there 
are multiple concurrent streamer. 2. patch name should be 
{{HDFS-8719-HDFS-7285-001.patch}}

> Erasure Coding: client generates too many small packets when writing parity 
> data
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-8719
>                 URL: https://issues.apache.org/jira/browse/HDFS-8719
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8719-001.patch
>
>
> Typically a packet is about 64K, but when writing parity data, many small 
> packets with size 512 bytes are generated. This may slow the write speed and 
> increase the network IO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to