[jira] [Commented] (HDFS-17497) Logic for committed blocks is mixed when computing file size

ASF GitHub Bot (Jira) Wed, 21 Aug 2024 19:31:10 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875714#comment-17875714
 ]


ASF GitHub Bot commented on HDFS-17497:
---------------------------------------

hfutatzhanghb commented on code in PR #6765:
URL: https://github.com/apache/hadoop/pull/6765#discussion_r1726157523


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:
##########
@@ -1105,15 +1106,12 @@ static void unprotectedUpdateCount(INodesInPath 
inodesInPath,
   /**
    * Update the cached quota space for a block that is being completed.
    * Must only be called once, as the block is being completed.
-   * @param completeBlk - Completed block for which to update space
-   * @param inodes - INodes in path to file containing completeBlk; if null
-   *                 this will be resolved internally
+   * @param commitBlock - Committed block for which to update space
+   * @param iip - INodes in path to file containing committedBlock
    */
-  public void updateSpaceForCompleteBlock(BlockInfo completeBlk,
-      INodesInPath inodes) throws IOException {
+  public void updateSpaceForCommittedBlock(Block commitBlock,
+      INodesInPath iip) throws IOException {
     assert namesystem.hasWriteLock();
-    INodesInPath iip = inodes != null ? inodes :
-        INodesInPath.fromINode(namesystem.getBlockCollection(completeBlk));
     INodeFile fileINode = iip.getLastINode().asFile();

Review Comment:
   In original logic, it will consider whether iip is null or not.  But we miss 
judgement here.





> Logic for committed blocks is mixed when computing file size
> ------------------------------------------------------------
>
>                 Key: HDFS-17497
>                 URL: https://issues.apache.org/jira/browse/HDFS-17497
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>
> One in-writing HDFS file may contains multiple committed blocks, as follows 
> (assume one file contains three blocks):
> || ||Block 1||Block 2||Block 3||
> |Case 1|Complete|Commit|UnderConstruction|
> |Case 2|Complete|Commit|Commit|
> |Case 3|Commit|Commit|Commit|
>  
> But the logic for committed blocks is mixed when computing file size, it 
> ignores the bytes of the last committed block and contains the bytes of other 
> committed blocks.
> {code:java}
> public final long computeFileSize(boolean includesLastUcBlock,
>     boolean usePreferredBlockSize4LastUcBlock) {
>   if (blocks.length == 0) {
>     return 0;
>   }
>   final int last = blocks.length - 1;
>   //check if the last block is BlockInfoUnderConstruction
>   BlockInfo lastBlk = blocks[last];
>   long size = lastBlk.getNumBytes();
>   // the last committed block is not complete, so it's bytes may be ignored.
>   if (!lastBlk.isComplete()) {
>      if (!includesLastUcBlock) {
>        size = 0;
>      } else if (usePreferredBlockSize4LastUcBlock) {
>        size = isStriped()?
>            getPreferredBlockSize() *
>                ((BlockInfoStriped)lastBlk).getDataBlockNum() :
>            getPreferredBlockSize();
>      }
>   }
>   // The bytes of other committed blocks are calculated into the file length.
>   for (int i = 0; i < last; i++) {
>     size += blocks[i].getNumBytes();
>   }
>   return size;
> } {code}
> The bytes of one committed block will not be changed, so the bytes of the 
> last committed block should be calculated into the file length too.
>  
> And the logic for committed blocks is mixed too when computing file length in 
> DFSInputStream. Normally DFSInputStream does not need to get visible length 
> for committed block regardless of whether the committed block is the last 
> block or not.
>  
> -HDFS-10843- encountered one bug which actually caused by the committed 
> block, but -HDFS-10843- fixed that bug by updating quota usage when 
> completing block. The num of bytes of the committed block will no longer 
> change, so we should update the quota usage when the block is committed, 
> which can reduce the delta quota usage in time.
>  
> So there are somethings we need to do:
>  * Unify the calculation logic for all committed blocks in 
> {{computeFileSize}} of {{INodeFile}}
>  * Unify the calculation logic for all committed blocks in {{getFileLength}} 
> of {{DFSInputStream}}
>  * Update quota usage when committing block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17497) Logic for committed blocks is mixed when computing file size

Reply via email to