[jira] [Commented] (HDFS-10460) Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block

Rakesh R (JIRA) Tue, 21 Jun 2016 08:20:27 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341960#comment-15341960
 ]


Rakesh R commented on HDFS-10460:
---------------------------------

Thanks [~drankye] for the detailed explanation. I have analysed this approach. 
I could see its little tricky logic.

We have two cases:

case-1) Say all DNs are working fine and no failure. For calculating the 
checksum it needs {{requestedNumBytes}} and this is used to build the exact 
block length from the {{blockGroup}}. At the beginning, it is setting 
{{block.setNumBytes(getRemaining())}} the requestedNumBytes here and which will 
inturn passed to the below logic to construct the block with required number of 
bytes. If we leave the numBytes unchanged then this logic will return wrong 
number of bytes for reading the checksum data.
{code}
ExtendedBlock block = StripedBlockUtil.constructInternalBlock(
              blockGroup, ecPolicy.getCellSize(), numDataUnits, idx);
{code}

case-2) With few DN failures. For reconstructing the block it needs the 
{{actualNumBytes}} and then recalculate the requestedNumBytes checksum data.
{code}
      ExtendedBlock reconBlockGroup = new ExtendedBlock(blockGroup);
      reconBlockGroup.setNumBytes(actualNumBytes);
{code}

What I'm trying to explain is, 
- in case-1: it needs {{blockGroup}} object with {{requestedNumBytes}}
- in case-2: it needs {{reconBlockGroup}} object with {{requestedNumBytes}}
So in either way there is a need of dummy object with requestedNumBytes. 

IMHO, can continue setting {{block.setNumBytes(getRemaining());}} logic in 
Replicated and Striped block. Then will consider reconstruction as a special 
case and will create {{reconBlockGroup}} object with actualNumBytes, like I'm 
doing in the current patch. Whats your opinion?

> Erasure Coding: Recompute block checksum for a particular range less than 
> file size on the fly by reconstructing missed block
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10460
>                 URL: https://issues.apache.org/jira/browse/HDFS-10460
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: HDFS-10460-00.patch, HDFS-10460-01.patch
>
>
> This jira is HDFS-9833 follow-on task to address reconstructing block and 
> then recalculating block checksum for a particular range query.
> For example,
> {code}
> // create a file 'stripedFile1' with fileSize = cellSize * numDataBlocks = 
> 65536 * 6 = 393216
> FileChecksum stripedFileChecksum = getFileChecksum(stripedFile1, 10, true);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10460) Erasure Coding: Recompute block checksum for a particular range less than file size on the fly by reconstructing missed block

Reply via email to