[jira] [Commented] (HDFS-2288) Replicas awaiting recovery should return a full visible length

Konstantin Shvachko (Commented) (JIRA) Fri, 09 Mar 2012 01:26:23 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225956#comment-13225956
 ]


Konstantin Shvachko commented on HDFS-2288:
-------------------------------------------

> My understanding of visible length is "the length that all datanodes in the 
> pipeline contain at least such amount of data."

There is no trusted source to obtain such information, unless you keep it in 
ZooKeeper or want to address the Byzantine Generals' Problem internally, which 
we don't.

Let me try to explain the notion of *visible length*. 
As per the [design 
doc|https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf]
 visible length is the _"number of bytes that have been acknowledged by the 
downstream DataNodes"_. It is replica (not block) specific, meaning it can be 
different for different replicas at a given time. In the document it is called 
BA (bytes acknowledged), compared to BR (bytes received).

If we have 3 replicas: r1, r2, r3 then all of them could have received the same 
number of bytes:
r1.BR = r2.BR = r3.BR, 
but visible lengths are different, because r3 hasn't acknowledged the latest 
packet to r2 and r1. Until then
r3.BA = r3.BR
r2.BA = r2.BR - p
r1.BA = r1.BR - p
where p is the packet length.

Now when a client reads a byte it first verifies with one of the replicas, 
suppose it was r3, if the byte is visible. The last-received-byte is visible in 
r3, and this means the client can read it from any replica. When the client 
reads the last-received-byte from r1, it sends to r1 the visible length 
obtained from r3. DN containing r1 realizes that the client has already 
confirmed with another replica, that the byte was visible there, and lets the 
client read that byte, even though it is not yet locally visible.

So our consistency guarantee is that after a client had read a byte from one 
replica that client (or any other knowledgeable of the fact) can read that same 
byte from any other replica.
                
> Replicas awaiting recovery should return a full visible length
> --------------------------------------------------------------
>
>                 Key: HDFS-2288
>                 URL: https://issues.apache.org/jira/browse/HDFS-2288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.24.0
>
>         Attachments: hdfs-2288.txt
>
>
> Currently, if the client calls getReplicaVisibleLength for a RWR, it returns 
> a visible length of 0. This causes one of HBase's tests to fail, and I 
> believe it's incorrect behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2288) Replicas awaiting recovery should return a full visible length

Reply via email to