[ 
https://issues.apache.org/jira/browse/HDFS-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3219:
------------------------------

    Description: 
HDFS-2288 there are two definition of visible length, or rather we're using the 
same name for two things:

1. The HDFS-265 design doc which defines it as property of the replica:

{quote}
visible length is the "number of bytes that have been acknowledged by the 
downstream DataNodes". It is replica (not block) specific, meaning it can be 
different for different replicas at a given time. In the document it is called 
BA (bytes acknowledged), compared to BR (bytes received).
{quote}

2. The definition in HDFS-814 and DFSClient#getVisibleLength which defines it 
as a property of a file:

{quote}
The visible length is the length that *all* datanodes in the pipeline contain 
at least such amount of data. Therefore, these data are visible to the readers.

According to this definition the visible length of a file is the floor of all 
visible lengths of all the replicas of the last block. It's a static property 
set on open, eg is not updated when a writer calls hflush. Also 
DFSInputStream#readBlockLength returns the 1st visible length of a replica it 
finds, so it seems possible (though unlikely) in a failure scenario it could 
return a length that was longer than what all replicas had.
{quote}

This has caused confusion in a number of other jiras. We should update the 
design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to 
disambiguate this.

  was:
HDFS-2288 there are two definition of visible length, or rather we're using the 
same name for two things:

# The HDFS-265 design doc which defines it as property of the replica:

{quote}
visible length is the "number of bytes that have been acknowledged by the 
downstream DataNodes". It is replica (not block) specific, meaning it can be 
different for different replicas at a given time. In the document it is called 
BA (bytes acknowledged), compared to BR (bytes received).
{quote}

# The definition in HDFS-814 and DFSClient#getVisibleLength which defines it as 
a property of a file:

{quote}
The visible length is the length that *all* datanodes in the pipeline contain 
at least such amount of data. Therefore, these data are visible to the readers.

According to this definition the visible length of a file is the floor of all 
visible lengths of all the replicas of the last block. It's a static property 
set on open, eg is not updated when a writer calls hflush. Also 
DFSInputStream#readBlockLength returns the 1st visible length of a replica it 
finds, so it seems possible (though unlikely) in a failure scenario it could 
return a length that was longer than what all replicas had.
{quote}

This has caused confusion in a number of other jiras. We should update the 
design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to 
disambiguate this.

    
> Disambiguate "visible length" in the code and docs
> --------------------------------------------------
>
>                 Key: HDFS-3219
>                 URL: https://issues.apache.org/jira/browse/HDFS-3219
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Priority: Minor
>
> HDFS-2288 there are two definition of visible length, or rather we're using 
> the same name for two things:
> 1. The HDFS-265 design doc which defines it as property of the replica:
> {quote}
> visible length is the "number of bytes that have been acknowledged by the 
> downstream DataNodes". It is replica (not block) specific, meaning it can be 
> different for different replicas at a given time. In the document it is 
> called BA (bytes acknowledged), compared to BR (bytes received).
> {quote}
> 2. The definition in HDFS-814 and DFSClient#getVisibleLength which defines it 
> as a property of a file:
> {quote}
> The visible length is the length that *all* datanodes in the pipeline contain 
> at least such amount of data. Therefore, these data are visible to the 
> readers.
> According to this definition the visible length of a file is the floor of all 
> visible lengths of all the replicas of the last block. It's a static property 
> set on open, eg is not updated when a writer calls hflush. Also 
> DFSInputStream#readBlockLength returns the 1st visible length of a replica it 
> finds, so it seems possible (though unlikely) in a failure scenario it could 
> return a length that was longer than what all replicas had.
> {quote}
> This has caused confusion in a number of other jiras. We should update the 
> design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to 
> disambiguate this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to