[ 
https://issues.apache.org/jira/browse/HDFS-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841647#comment-17841647
 ] 

ASF GitHub Bot commented on HDFS-17497:
---------------------------------------

Hexiaoqiao commented on PR #6765:
URL: https://github.com/apache/hadoop/pull/6765#issuecomment-2081489723

   Great catch, not review carefully, but I remember this have been discussed 
for long time. IIRC, client also check the file length through request DataNode 
which manage the uncomplete block? Thanks.
   (will try to review PR later.)




> Logic for committed blocks is mixed when computing file size
> ------------------------------------------------------------
>
>                 Key: HDFS-17497
>                 URL: https://issues.apache.org/jira/browse/HDFS-17497
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>
> One in-writing HDFS file may contains multiple committed blocks, as follows 
> (assume one file contains three blocks):
> || ||Block 1||Block 2||Block 3||
> |Case 1|Complete|Commit|UnderConstruction|
> |Case 2|Complete|Commit|Commit|
> |Case 3|Commit|Commit|Commit|
>  
> But the logic for committed blocks is mixed when computing file size, it 
> ignores the bytes of the last committed block and contains the bytes of other 
> committed blocks.
> {code:java}
> public final long computeFileSize(boolean includesLastUcBlock,
>     boolean usePreferredBlockSize4LastUcBlock) {
>   if (blocks.length == 0) {
>     return 0;
>   }
>   final int last = blocks.length - 1;
>   //check if the last block is BlockInfoUnderConstruction
>   BlockInfo lastBlk = blocks[last];
>   long size = lastBlk.getNumBytes();
>   // the last committed block is not complete, so it's bytes may be ignored.
>   if (!lastBlk.isComplete()) {
>      if (!includesLastUcBlock) {
>        size = 0;
>      } else if (usePreferredBlockSize4LastUcBlock) {
>        size = isStriped()?
>            getPreferredBlockSize() *
>                ((BlockInfoStriped)lastBlk).getDataBlockNum() :
>            getPreferredBlockSize();
>      }
>   }
>   // The bytes of other committed blocks are calculated into the file length.
>   for (int i = 0; i < last; i++) {
>     size += blocks[i].getNumBytes();
>   }
>   return size;
> } {code}
> The bytes of one committed block will not be changed, so the bytes of the 
> last committed block should be calculated into the file length too.
>  
> And the logic for committed blocks is mixed too when computing file length in 
> DFSInputStream. Normally DFSInputStream does not need to get visible length 
> for committed block regardless of whether the committed block is the last 
> block or not.
>  
> -HDFS-10843- encountered one bug which actually caused by the committed 
> block, but -HDFS-10843- fixed that bug by updating quota usage when 
> completing block. The num of bytes of the committed block will no longer 
> change, so we should update the quota usage when the block is committed, 
> which can reduce the delta quota usage in time.
>  
> So there are somethings we need to do:
>  * Unify the calculation logic for all committed blocks in 
> {{computeFileSize}} of {{INodeFile}}
>  * Unify the calculation logic for all committed blocks in {{getFileLength}} 
> of {{DFSInputStream}}
>  * Update quota usage when committing block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to