[ 
https://issues.apache.org/jira/browse/HDFS-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8453:
----------------------------
    Status: Patch Available  (was: In Progress)

Actually it's not possible to assign meaningful start offset values for all 
internal blocks, especially parity ones. Consider a block group with 1 byte of 
data. No matter how to set the start offsets for parity blocks (negative 
values, etc.), they will overlap with the next block group in the file. 

So this patch takes another approach: refactor {{DFSInputStream}} with a new 
{{refreshLocatedBlock}} method when the located block is to be refreshed 
instead of calling {{getBlockAt}} at first time. Then the refresh method can be 
extended in {{DFSStripedInputStream}} with index handling.

> Erasure coding: properly assign start offset for internal blocks in a block 
> group
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-8453
>                 URL: https://issues.apache.org/jira/browse/HDFS-8453
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-8453-HDFS-7285.00.patch
>
>
> {{LocatedBlock#offset}} should indicate the "offset of the first byte of the 
> block in the file". In a striped block group, we should properly assign this 
> {{offset}} for internal blocks, so each internal block can be identified from 
> a given offset.
> My current plan is to keep using {{bg.getStartOffset() + idxInBlockGroup * 
> cellSize}} as the start offset for data blocks. For parity blocks, use {{-1 * 
> (bg.getStartOffset() + idxInBlockGroup * cellSize)}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to