[ 
https://issues.apache.org/jira/browse/HDFS-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392439#comment-14392439
 ] 

Walter Su commented on HDFS-7782:
---------------------------------

The logic in 005 patch looks good. Great work!
bq.  It will be better to directly read data into the given byte array. You may 
need to extend getFromOneDataNode to achieve this for parallel reading.
I think it's much more complicated than extending getFromOneDataNode.
bq. In the new parallel read mode, check if the read is for more than 1 full 
stripe of cells; if not, directly use the given byte array
It's doable And I think we should always use the given byte[] buf. Because if 
the given byte[] buf is huge (and given {{len}} arguments is big), creating the 
same size Bytebuffer will consume lots of memory.
I have an idea. We can create more tasks. Each task reads exactly one cellSize. 
Since one cellSize of data is sequential, we can put it into the specific 
location of the given buf[] without moving it later. Still the 005 patch works 
for me, It keep the code clean. 

Minor. (TODO in the future)
{{MiniDFSCluster.injectBlocks(..)}} filled block with repeated byte 
{{DEFAULT_DATABYTE}}, It's not enough. We need to add a test case to 
{{testPread()}} which injects *random* data, and so we can test the data 
integrity. It's necessary because the reading converts the striped data to 
contiguous data. We can add such test cases later, since it depends on EC 
striped writing. It looks like but more than {{testFillResultBuffer}}.

btw, I'm a little confused where each {{groupSize}} comes from.

> Erasure coding: pread from files in striped layout
> --------------------------------------------------
>
>                 Key: HDFS-7782
>                 URL: https://issues.apache.org/jira/browse/HDFS-7782
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7782-000.patch, HDFS-7782-001.patch, 
> HDFS-7782-002.patch, HDFS-7782-003.patch, HDFS-7782-004.patch, 
> HDFS-7782-005.patch
>
>
> If client wants to read a file, he is not necessary to know and handle what 
> layout the file is. This sub task adds logic to DFSInputStream to support 
> reading striping layout files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to