[ 
https://issues.apache.org/jira/browse/HDFS-12222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156634#comment-16156634
 ] 

Huafeng Wang commented on HDFS-12222:
-------------------------------------

Hi [~andrew.wang], thanks for your review! I just uploaded a new patch. In this 
patch I mainly:
* Removed the getECBlockLocation function and ECBlockLocation class.
* Fixed {{getFileBlockLocation}} of DFSClient.
* Add comments about {{getFileBlockLocation}}, {{listFiles}} and 
{{listLocatedStatus}} in {{FileSystem}}, {{DistributedFileSystem}} and 
{{FileContext}}
* Add comments about {{makeQualifiedLocated}} in {{HdfsLocatedFileStatus}}
* Add tests for {{DistributedFileSystem.getFileBlockLocation}}, 
{{DistributedFileSystem.listFiles}}, {{FileContext.getFileBlockLocation}} and 
{{FileContext.listFiles}} in case of ec with various file size.

And about 
{quote}
Could you verify that fsck -files -blocks -locations still returns parity 
blocks?
{quote}

I checked the output of {{fsck -files -blocks -locations}}, it does not have 
very detailed block location info of an erasure coded file. An output example 
of a 6+3 eraure coded file will be like 
{code}
0. BP-417570284-10.239.160.132-1504687036886:blk_-9223372036854775792_1001 
len=6291456 Live_repl=9  
[blk_-9223372036854775792:DatanodeInfoWithStorage[127.0.0.1:54859,DS-09a24593-5cbc-444c-ad43-ab1b39c65887,DISK](LIVE),
 
blk_-9223372036854775791:DatanodeInfoWithStorage[127.0.0.1:54863,DS-80d7a2bb-5acc-437c-936a-bd28314e2a8c,DISK](LIVE),
 
blk_-9223372036854775790:DatanodeInfoWithStorage[127.0.0.1:54883,DS-05a880c7-0fa2-4683-a382-06ec7d975fd3,DISK](LIVE),
 
blk_-9223372036854775789:DatanodeInfoWithStorage[127.0.0.1:54854,DS-8a5cf2da-1c7e-4942-b57c-8755ddb3cfcb,DISK](LIVE),
 
blk_-9223372036854775788:DatanodeInfoWithStorage[127.0.0.1:54871,DS-95c64656-3131-413c-b400-0f14612b387d,DISK](LIVE),
 
blk_-9223372036854775787:DatanodeInfoWithStorage[127.0.0.1:54867,DS-fbf6ea90-8829-44ce-8681-b5f53be726c1,DISK](STALE_BLOCK_CONTENT),
 
blk_-9223372036854775786:DatanodeInfoWithStorage[127.0.0.1:54875,DS-d40bfede-c5c9-4cb0-8b5e-92ead1bbb4da,DISK](LIVE),
 
blk_-9223372036854775785:DatanodeInfoWithStorage[127.0.0.1:54879,DS-c999124f-3d0e-4f6c-bd31-5f0fdff86fca,DISK](STALE_BLOCK_CONTENT),
 
blk_-9223372036854775784:DatanodeInfoWithStorage[127.0.0.1:54850,DS-7ff8f0ed-b62a-40a9-8966-b16f71532712,DISK](LIVE)]
{code} 

So you mean we should also remove the parity blocks info?

> Add EC information to BlockLocation
> -----------------------------------
>
>                 Key: HDFS-12222
>                 URL: https://issues.apache.org/jira/browse/HDFS-12222
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: Huafeng Wang
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-12222.001.patch, HDFS-12222.002.patch, 
> HDFS-12222.003.patch, HDFS-12222.004.patch
>
>
> HDFS applications query block location information to compute splits. One 
> example of this is FileInputFormat:
> https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346
> You see bits of code like this that calculate offsets as follows:
> {noformat}
>     long bytesInThisBlock = blkLocations[startIndex].getOffset() + 
>                           blkLocations[startIndex].getLength() - offset;
> {noformat}
> EC confuses this since the block locations include parity block locations as 
> well, which are not part of the logical file length. This messes up the 
> offset calculation and thus topology/caching information too.
> Applications can figure out what's a parity block by reading the EC policy 
> and then parsing the schema, but it'd be a lot better if we exposed this more 
> generically in BlockLocation instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to