[
https://issues.apache.org/jira/browse/HDFS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482574#comment-13482574
]
Tsz Wo (Nicholas), SZE commented on HDFS-3219:
----------------------------------------------
Hi Yanbo,
First of all, visible length is not the same the data length in datanodes. One
nice property of visible length is that if the visible length in one datanode
is N, then all datanodes with the block has data length >= N. So that clients
can start with any datanode and fail over to any other datanode.
A little bit more details: in #2 of your example, we have a the write pipeline
Client -> Datanode_1 -> Datanode_2 -> Datanode_3
Then, we have
BR_1 >= BR_2 >= BR_3 >= BA_3 >= BA_2 >= BA_1
where BR is "block received" and BA is "block acked". Datanode_i takes BA_i as
the visible length since BR_i >= BA_j for any i and j.
For more details, please see Section 3.2 in the HDFS-265 design doc
(appendDesign3.pdf).
> Disambiguate "visible length" in the code and docs
> --------------------------------------------------
>
> Key: HDFS-3219
> URL: https://issues.apache.org/jira/browse/HDFS-3219
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Eli Collins
> Priority: Minor
>
> HDFS-2288 there are two definition of visible length, or rather we're using
> the same name for two things:
> 1. The HDFS-265 design doc which defines it as property of the replica:
> {quote}
> visible length is the "number of bytes that have been acknowledged by the
> downstream DataNodes". It is replica (not block) specific, meaning it can be
> different for different replicas at a given time. In the document it is
> called BA (bytes acknowledged), compared to BR (bytes received).
> {quote}
> 2. The definition in HDFS-814 and DFSClient#getVisibleLength which defines it
> as a property of a file:
> {quote}
> The visible length is the length that *all* datanodes in the pipeline contain
> at least such amount of data. Therefore, these data are visible to the
> readers.
> {quote}
> According to this definition the visible length of a file is the floor of all
> visible lengths of all the replicas of the last block. It's a static property
> set on open, eg is not updated when a writer calls hflush. Also
> DFSInputStream#readBlockLength returns the 1st visible length of a replica it
> finds, so it seems possible (though unlikely) in a failure scenario it could
> return a length that was longer than what all replicas had.
> This has caused confusion in a number of other jiras. We should update the
> design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to
> disambiguate this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira