Disambiguate "visible length" in the code and docs
--------------------------------------------------
Key: HDFS-3219
URL: https://issues.apache.org/jira/browse/HDFS-3219
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Eli Collins
Priority: Minor
HDFS-2288 there are two definition of visible length, or rather we're using the
same name for two things:
# The HDFS-265 design doc which defines it as property of the replica:
{quote}
visible length is the "number of bytes that have been acknowledged by the
downstream DataNodes". It is replica (not block) specific, meaning it can be
different for different replicas at a given time. In the document it is called
BA (bytes acknowledged), compared to BR (bytes received).
{quote}
# The definition in HDFS-814 and DFSClient#getVisibleLength which defines it as
a property of a file:
{quote}
The visible length is the length that *all* datanodes in the pipeline contain
at least such amount of data. Therefore, these data are visible to the readers.
According to this definition the visible length of a file is the floor of all
visible lengths of all the replicas of the last block. It's a static property
set on open, eg is not updated when a writer calls hflush. Also
DFSInputStream#readBlockLength returns the 1st visible length of a replica it
finds, so it seems possible (though unlikely) in a failure scenario it could
return a length that was longer than what all replicas had.
{quote}
This has caused confusion in a number of other jiras. We should update the
design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to
disambiguate this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira