[
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166613#comment-15166613
]
Mingliang Liu commented on HDFS-7661:
-------------------------------------
Thanks for you comments, [~drankye].
1. Augmenting the crc file, i.e. meta file, is possible. However, it becomes
too complicated if we interleave the checksum and BG length records. If we
place them in two segments of the .meta file as | header | crc | bglen
records|, the CRC section should be preserved, which leads to holes in the file.
Meanwhile, the {{.bglen}} file is treated as a redo/undo log whose records are
to:
* indicate the state of parity block data file (i.e. last cell): complete or
incomplete. Incomplete means partial parity cell.
* rollback last cell to previous healthy data if the state is incomplete. If
the last cell is being overwritten, we need rollback to the state before
overwrite happens; or else, the last cell is simply abandoned.
We don't need these records for original data block. I'll update the design doc
in detail to show how can we rollback safely using the {{bglen}} records.
2. I totally agree we should document {{offsetInBlock, packetLen,
blockGroupLen}} definition and why we need them in the first place. Based on
offline discussion with [~demongaorui] yesterday, we're refining the design doc
with more detailed design motivations, which will show the challenging
scenarios and why we need advanced techniques to address them. [~demongaorui]
and I will share the design doc later this week. I appreciate your further
review and comments.
3. The intension of the example was that we should not make any assumption
about the packet size and cell size, but not assuming that they're naturally
different. The fact is that they could be different and not aligned. Actually
the current default size is not aligned, i.e. the packet data size is 63 KB and
the cell size is 64 KB (just as the example showed). The cell size is EC policy
dependent while we have different constraints on packet data size, refer to
[HDFS-7308]. The best we can do is to forcefully make them aligned, in which
case we still need to deal with scenarios that one cell may need multiple
transmission packets or one packet contains multiple cells.
Ping [~demongaorui]] for discussion.
> Erasure coding: support hflush and hsync
> ----------------------------------------
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png,
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch,
> HDFS-EC-file-flush-sync-design-version1.1.pdf,
> HDFS-EC-file-flush-sync-design-version2.0.pdf
>
>
> We also need to support hflush/hsync and visible length.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)