[
https://issues.apache.org/jira/browse/HDFS-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhe Zhang updated HDFS-8453:
----------------------------
Status: Patch Available (was: In Progress)
Actually it's not possible to assign meaningful start offset values for all
internal blocks, especially parity ones. Consider a block group with 1 byte of
data. No matter how to set the start offsets for parity blocks (negative
values, etc.), they will overlap with the next block group in the file.
So this patch takes another approach: refactor {{DFSInputStream}} with a new
{{refreshLocatedBlock}} method when the located block is to be refreshed
instead of calling {{getBlockAt}} at first time. Then the refresh method can be
extended in {{DFSStripedInputStream}} with index handling.
> Erasure coding: properly assign start offset for internal blocks in a block
> group
> ---------------------------------------------------------------------------------
>
> Key: HDFS-8453
> URL: https://issues.apache.org/jira/browse/HDFS-8453
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Attachments: HDFS-8453-HDFS-7285.00.patch
>
>
> {{LocatedBlock#offset}} should indicate the "offset of the first byte of the
> block in the file". In a striped block group, we should properly assign this
> {{offset}} for internal blocks, so each internal block can be identified from
> a given offset.
> My current plan is to keep using {{bg.getStartOffset() + idxInBlockGroup *
> cellSize}} as the start offset for data blocks. For parity blocks, use {{-1 *
> (bg.getStartOffset() + idxInBlockGroup * cellSize)}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)