[ 
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108380#comment-13108380
 ] 

Uma Maheswara Rao G commented on HDFS-1950:
-------------------------------------------

Hi,

 Fix is basically for two scenarios.
  
 1) DFSClient side changes is basically to ensure reading the partial block.

 2) Problem here is, DFClient will go for next fetch based on the blocks size.

Consider a corner boundary case (take prefetch size as 10 *blocksize), number 
of blocks are exactly same as 10.5 or 20.5 ...etc can create problem, because 
clinet will not even bother for next fetch because initially he will not know 
the size of that partial block. He will know only that 10blocks size. 

To fix this problem, we have introduce on check in FSNameSystem.

{code}
     ......
     ......
     LocatedBlocks createLocatedBlocks = inode.createLocatedBlocks(results);

     createLocatedBlocksForThePartialBlock(inode, blocks, curPos,
                        createLocatedBlocks);

     return createLocatedBlocks;
  }


   private void createLocatedBlocksForThePartialBlock(INodeFile inode,
                Block[] blocks, long curPos, LocatedBlocks createLocatedBlocks) 
{
        int curBlk;
        if(blocks.length > PRE_FETCH_SIZE &&
                        blocks.length % PRE_FETCH_SIZE == 1 && 
createLocatedBlocks.getFileLength() == curPos)
    {

   .........
   ........
{code}

When blocks are exactly 10.5, then FSNameSystem will populate the 0.5th block 
id also. So, that client anyway will update the partial block size. Client can 
take care of reading this boundary partial block.

This patch is basically for review. Here one more problem i wanted to raise is 
that, prefetch size is configured.
In this patch i put that value as 10.

{code}
 prefetchSize = conf.getLong("dfs.read.prefetch.size", prefetchSize);
 {code}
I am planning to include this property in server namenode as well, because of 
above reason. or Do yu have any other suggestion.

Once the patch is approved i will prepare it for 20Append and 205 branches with 
the provided suggestions.


Thanks
Uma



> Blocks that are under construction are not getting read if the blocks are 
> more than 10. Only complete blocks are read properly. 
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1950
>                 URL: https://issues.apache.org/jira/browse/HDFS-1950
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.20-append
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: Uma Maheswara Rao G
>            Priority: Blocker
>             Fix For: 0.20-append
>
>         Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, 
> hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, 
> hdfs-1950-trunk-test.txt
>
>
> Before going to the root cause lets see the read behavior for a file having 
> more than 10 blocks in append case.. 
> Logic: 
> ==== 
> There is prefetch size dfs.read.prefetch.size for the DFSInputStream which 
> has default value of 10 
> This prefetch size is the number of blocks that the client will fetch from 
> the namenode for reading a file.. 
> For example lets assume that a file X having 22 blocks is residing in HDFS 
> The reader first fetches first 10 blocks from the namenode and start reading 
> After the above step , the reader fetches the next 10 blocks from NN and 
> continue reading 
> Then the reader fetches the remaining 2 blocks from NN and complete the write 
> Cause: 
> ======= 
> Lets see the cause for this issue now... 
> Scenario that will fail is "Writer wrote 10+ blocks and a partial block and 
> called sync. Reader trying to read the file will not get the last partial 
> block" . 
> Client first gets the 10 block locations from the NN. Now it checks whether 
> the file is under construction and if so it gets the size of the last partial 
> block from datanode and reads the full file 
> However when the number of blocks is more than 10, the last block will not be 
> in the first fetch. It will be in the second or other blocks(last block will 
> be in (num of blocks / 10)th fetch) 
> The problem now is, in DFSClient there is no logic to get the size of the 
> last partial block(as in case of point 1), for the rest of the fetches other 
> than first fetch, the reader will not be able to read the complete data 
> synced...........!! 
> also the InputStream.available api uses the first fetched block size to 
> iterate. Ideally this size has to be increased

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to