[jira] [Commented] (HDFS-8722) Optimize datanode writes for small writes and flushes

Kihwal Lee (JIRA) Wed, 08 Jul 2015 11:25:49 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619104#comment-14619104
 ]


Kihwal Lee commented on HDFS-8722:
----------------------------------

Forgot to remove one line in the patch.

> Optimize datanode writes for small writes and flushes
> -----------------------------------------------------
>
>                 Key: HDFS-8722
>                 URL: https://issues.apache.org/jira/browse/HDFS-8722
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-8722.patch
>
>
> After the data corruption fix by HDFS-4660, the CRC recalculation for partial 
> chunk is executed more frequently, if the client repeats writing few bytes 
> and calling hflush/hsync.  This is because the generic logic forces CRC 
> recalculation if on-disk data is not CRC chunk aligned. Prior to HDFS-4660, 
> datanode blindly accepted whatever CRC client provided, if the incoming data 
> is chunk-aligned. This was the source of the corruption.
> We can still optimize for the most common case where a client is repeatedly 
> writing small number of bytes followed by hflush/hsync with no pipeline 
> recovery or append, by allowing the previous behavior for this specific case. 
>  If the incoming data has a duplicate portion and that is at the last 
> chunk-boundary before the partial chunk on disk, datanode can use the 
> checksum supplied by the client without redoing the checksum on its own.  
> This reduces disk reads as well as CPU load for the checksum calculation.
> If the incoming packet data goes back further than the last on-disk chunk 
> boundary, datanode will still do a recalculation, but this occurs rarely 
> during pipeline recoveries. Thus the optimization for this specific case 
> should be sufficient to speed up the vast majority of cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8722) Optimize datanode writes for small writes and flushes

Reply via email to