[ https://issues.apache.org/jira/browse/HDFS-12222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122607#comment-16122607 ]
Kai Zheng commented on HDFS-12222: ---------------------------------- I thought a little bit more about this, Huafeng. Could you help check if it works for you? Thanks! Even we use ECSchema info in hadoop common side codes, it's still tricky to use that info to parse for a erasure coded block locations in hadoop common side since we may need couple with HDFS internals. Could we have a hadoop common class like {{ErasureCodedBlockLocation}} which contain methods to get data/parity block locations plus cell size info and it can be passed into a new {{LocatedFileStatus}} constructor. The object of ErasureCodedBlockLocation can be constructed with parsed info in HDFS side. > Add EC information to BlockLocation > ----------------------------------- > > Key: HDFS-12222 > URL: https://issues.apache.org/jira/browse/HDFS-12222 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0-alpha1 > Reporter: Andrew Wang > Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org