[ 
https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5704:
----------------------------

    Attachment: HDFS-5704.000.patch

Early patch just for review. 

Currently several calls can trigger an UPDATE_BLOCKS operation: 
ClientProtocol#abandonBlock, ClientProtocol#fsync, ClientProtocol#addBlock, 
DatanodeProtocol#commitBlockSynchronization, and clientProtocol#updatePipeline. 
This patch adds a new editlog op OP_ADD_BLOCK for ClientProtocol#addBlock. 
Specifically, since FSNamesystem#getAdditionalBlock can update the original 
last block of the file (i.e., updating its length and changing its state), the 
new AddBlockOp records both the new block and original last block.

Besides, looks like getAdditionalBlock already handles the retry scenario thus 
I think we will not meet repeated AddBlockOp in editlog.

Will add more unit tests in the next patch.

> Change OP_UPDATE_BLOCKS  with a new OP_ADD_BLOCK
> ------------------------------------------------
>
>                 Key: HDFS-5704
>                 URL: https://issues.apache.org/jira/browse/HDFS-5704
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Suresh Srinivas
>            Assignee: Jing Zhao
>         Attachments: HDFS-5704.000.patch
>
>
> Currently every time a block a allocated, the entire list of blocks are 
> written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth 
> issue. The total size of editlog records for a file with large number of 
> blocks could be huge.
> The goal of this jira is discuss adding a different editlog record that only 
> records allocation of block and not the entire block list, on every block 
> allocation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to