[ 
https://issues.apache.org/jira/browse/HDFS-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179461#comment-16179461
 ] 

Andrew Wang commented on HDFS-12534:
------------------------------------

Hi [~HuafengWang], thanks for taking a look, let me try to explain in more 
detail based on my understanding from talking with Marcelo:

* When writing, applications write with awareness of the blocksize. They will 
try to pad to block boundaries, and expect the file to be splittable at the 
block boundaries.
* When reading, applications use the BlockLocations returned by HDFS to 
understand where the split points are.
* With the current EC scheme, since an entire block group is represented by a 
single BlockLocation, we won't get as much parallelism as we'd like. For 
instance, with RS(6,3) with a blocksize of 100MB, a 600MB file would be written 
to have six split points, but only have a single BlockLocation for the entire 
block group.

I haven't looked at FileInputFormat yet to figure out how this works for S3.

> Provide logical BlockLocations for EC files for better split calculation
> ------------------------------------------------------------------------
>
>                 Key: HDFS-12534
>                 URL: https://issues.apache.org/jira/browse/HDFS-12534
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>    Affects Versions: 3.0.0-beta1
>            Reporter: Andrew Wang
>              Labels: hdfs-ec-3.0-must-do
>
> I talked to [~vanzin] and [~alex.behm] some more about split calculation with 
> EC. It turns out HDFS-12222 was resolved prematurely. Applications depend on 
> HDFS BlockLocation to understand where the split points are. The current 
> scheme of returning one BlockLocation per block group loses this information.
> We should change this to provide logical blocks. Divide the file length by 
> the block size and provide suitable BlockLocations to match, with virtual 
> offsets and lengths too.
> I'm not marking this as incompatible, since changing it this way would in 
> fact make it more compatible from the perspective of applications that are 
> scheduling against replicated files. Thus, it'd be good for beta1 if 
> possible, but okay for later too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to