[ 
https://issues.apache.org/jira/browse/HDFS-12222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huafeng Wang reassigned HDFS-12222:
-----------------------------------

    Assignee: Huafeng Wang

> Add EC information to BlockLocation
> -----------------------------------
>
>                 Key: HDFS-12222
>                 URL: https://issues.apache.org/jira/browse/HDFS-12222
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: Huafeng Wang
>              Labels: hdfs-ec-3.0-nice-to-have
>
> HDFS applications query block location information to compute splits. One 
> example of this is FileInputFormat:
> https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346
> You see bits of code like this that calculate offsets as follows:
> {noformat}
>     long bytesInThisBlock = blkLocations[startIndex].getOffset() + 
>                           blkLocations[startIndex].getLength() - offset;
> {noformat}
> EC confuses this since the block locations include parity block locations as 
> well, which are not part of the logical file length. This messes up the 
> offset calculation and thus topology/caching information too.
> Applications can figure out what's a parity block by reading the EC policy 
> and then parsing the schema, but it'd be a lot better if we exposed this more 
> generically in BlockLocation instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to