[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

Zhe Zhang (JIRA) Sun, 20 Sep 2015 00:34:24 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877451#comment-14877451
 ]


Zhe Zhang commented on HDFS-9040:
---------------------------------

Thanks for the comment Jing. The discussion is getting more and more 
interesting :)

bq. our current implementation (with GS bump) does not have the guarantee that 
an internal block with higher GS must have longer safe length
This is a great observation. I used to think GS is helpful to detect stale UC 
replicas in the read-being-written scenario, but actually reading from a "slow 
replica" is as bad as from a stale one.

bq. To recovery lease, the NN may have to contact all the DataNodes and 
identify the "safe length" of the block group.
Current lease recovery algorithm searches for the minimal length of all "good" 
replicas (with correct GS), and then truncates all other "good" replicas to 
that length. Does "safe length" refer to this minimal length? As indicated in 
the HADOOP-1700 design [doc | 
https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc], the 
option of growing all "good" replicas to the maximum length was also considered 
but given up for overhead concern. We can also consider doing some data 
reconstruction in EC file lease recovery.

I'm still trying to understand why we discard replicas with stale GS in lease 
recovery. Per Jing's analysis, For non-EC files, a replica with a higher GS 
should have a larger length anyway, so this question was not important. But in 
lease recovery for EC files, shouldn't we just make decision based on the 
length of internal blocks? From another angle, if internal_block_1 has a larger 
GS but smaller length than internal_block_2, doesn't it mean internal_block_2 
is fresher?

Without append / truncate, the only use case for GS I can think of is 
"Datanodes storing legacy blocks were dead for a long time and re-join the 
cluster" (HADOOP-1497). [~jingzhao] Do you think this is why we consider GS in 
calculating "safe length"?

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

Reply via email to