[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166613#comment-15166613
 ] 

Mingliang Liu commented on HDFS-7661:
-------------------------------------

Thanks for you comments, [~drankye].

1. Augmenting the crc file, i.e. meta file, is possible. However, it becomes 
too complicated if we interleave the checksum and BG length records. If we 
place them in two segments of the .meta file as | header | crc | bglen 
records|, the CRC section should be preserved, which leads to holes in the file.
Meanwhile, the {{.bglen}} file is treated as a redo/undo log whose records are 
to:
  * indicate the state of parity block data file (i.e. last cell): complete or 
incomplete. Incomplete means partial parity cell.
  * rollback last cell to previous healthy data if the state is incomplete. If 
the last cell is being overwritten, we need  rollback to the state before 
overwrite happens; or else, the last cell is simply abandoned.
We don't need these records for original data block. I'll update the design doc 
in detail to show how can we rollback safely using the {{bglen}} records.

2. I totally agree we should document {{offsetInBlock, packetLen, 
blockGroupLen}} definition and why we need them in the first place. Based on 
offline discussion with [~demongaorui] yesterday, we're refining the design doc 
with more detailed design motivations, which will show the challenging 
scenarios and why we need advanced techniques to address them. [~demongaorui] 
and I will share the design doc later this week. I appreciate your further 
review and comments.

3. The intension of the example was that we should not make any assumption 
about the packet size and cell size, but not assuming that they're naturally 
different. The fact is that they could be different and not aligned. Actually 
the current default size is not aligned, i.e. the packet data size is 63 KB and 
the cell size is 64 KB (just as the example showed). The cell size is EC policy 
dependent while we have different constraints on packet data size, refer to 
[HDFS-7308]. The best we can do is to forcefully make them aligned, in which 
case we still need to deal with scenarios that one cell may need multiple 
transmission packets or one packet contains multiple cells.

Ping [~demongaorui]] for discussion.

> Erasure coding: support hflush and hsync
> ----------------------------------------
>
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, 
> HDFS-EC-file-flush-sync-design-version2.0.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to