[ 
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583876#comment-15583876
 ] 

Hadoop QA commented on HDFS-1950:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-1950 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-1950 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12490392/HDFS-1950.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/17192/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Blocks that are under construction are not getting read if the blocks are 
> more than 10. Only complete blocks are read properly. 
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1950
>                 URL: https://issues.apache.org/jira/browse/HDFS-1950
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, namenode
>    Affects Versions: 0.20.205.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: Uma Maheswara Rao G
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, 
> hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, 
> hdfs-1950-trunk-test.txt
>
>
> Before going to the root cause lets see the read behavior for a file having 
> more than 10 blocks in append case.. 
> Logic: 
> ==== 
> There is prefetch size dfs.read.prefetch.size for the DFSInputStream which 
> has default value of 10 
> This prefetch size is the number of blocks that the client will fetch from 
> the namenode for reading a file.. 
> For example lets assume that a file X having 22 blocks is residing in HDFS 
> The reader first fetches first 10 blocks from the namenode and start reading 
> After the above step , the reader fetches the next 10 blocks from NN and 
> continue reading 
> Then the reader fetches the remaining 2 blocks from NN and complete the write 
> Cause: 
> ======= 
> Lets see the cause for this issue now... 
> Scenario that will fail is "Writer wrote 10+ blocks and a partial block and 
> called sync. Reader trying to read the file will not get the last partial 
> block" . 
> Client first gets the 10 block locations from the NN. Now it checks whether 
> the file is under construction and if so it gets the size of the last partial 
> block from datanode and reads the full file 
> However when the number of blocks is more than 10, the last block will not be 
> in the first fetch. It will be in the second or other blocks(last block will 
> be in (num of blocks / 10)th fetch) 
> The problem now is, in DFSClient there is no logic to get the size of the 
> last partial block(as in case of point 1), for the rest of the fetches other 
> than first fetch, the reader will not be able to read the complete data 
> synced...........!! 
> also the InputStream.available api uses the first fetched block size to 
> iterate. Ideally this size has to be increased



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to