[PR] HDFS-17497. The number of bytes of the last committed block should be calculated into the file length [hadoop]

via GitHub Tue, 23 Apr 2024 06:16:02 -0700


ZanderXu opened a new pull request, #6765:
URL: https://github.com/apache/hadoop/pull/6765


   <p>One in-writing HDFS file may contains multiple committed blocks, as 
follows (assume one file contains three blocks):</p>
   <div class="table-wrap">
   
     | Block 1 | Block 2 | Block 3
   -- | -- | -- | --
   Case 1 | Complete | Commit | UnderConstruction
   Case 2 | Complete | Commit | Commit
   Case 3 | Commit | Commit | Commit
   
   
   </div>
   <p>&nbsp;</p>
   <p>But the logic for committed blocks is mixed when computing file size, it 
ignores the bytes of the last committed block and contains the bytes of other 
committed blocks.</p>
   <pre class="code panel" style="border-width: 1px;" 
data-language="code-java"><span class="code-keyword">public</span> <span 
class="code-keyword">final</span> <span class="code-object">long</span> 
computeFileSize(<span class="code-object">boolean</span> includesLastUcBlock,
       <span class="code-object">boolean</span> 
usePreferredBlockSize4LastUcBlock) {
     <span class="code-keyword">if</span> (blocks.length == 0) {
       <span class="code-keyword">return</span> 0;
     }
     <span class="code-keyword">final</span> <span 
class="code-object">int</span> last = blocks.length - 1;
     <span class="code-comment">//check <span class="code-keyword">if</span> 
the last block is BlockInfoUnderConstruction
   </span>  BlockInfo lastBlk = blocks[last];
     <span class="code-object">long</span> size = lastBlk.getNumBytes();
     <span class="code-comment">// the last committed block is not complete, so 
it's bytes may be ignored.
   </span>  <span class="code-keyword">if</span> (!lastBlk.isComplete()) {
        <span class="code-keyword">if</span> (!includesLastUcBlock) {
          size = 0;
        } <span class="code-keyword">else</span> <span 
class="code-keyword">if</span> (usePreferredBlockSize4LastUcBlock) {
          size = isStriped()?
              getPreferredBlockSize() *
                  ((BlockInfoStriped)lastBlk).getDataBlockNum() :
              getPreferredBlockSize();
        }
     }
     <span class="code-comment">// The bytes of other committed blocks are 
calculated into the file length.
   </span>  <span class="code-keyword">for</span> (<span 
class="code-object">int</span> i = 0; i &lt; last; i++) {
       size += blocks[i].getNumBytes();
     }
     <span class="code-keyword">return</span> size;
   } </pre>
   <p>The bytes of one committed block will not be changed, so the bytes of the 
last committed block should be calculated into the file length too.</p>
   <p>&nbsp;</p>
   <p>And the logic for committed blocks is mixed too when computing file 
length in DFSInputStream. Normally DFSInputStream doesn't get visible length 
for committed block regardless of whether the committed block is the last block 
or not.</p>
   <p>&nbsp;</p>
   <p><a class="issue-link" title="Update space quota when a UC block is 
completed rather than committed." 
href="https://issues.apache.org/jira/browse/HDFS-10843"; 
data-issue-key="HDFS-10843"><del>HDFS-10843</del></a> noticed one bug which 
actually caused by the committed block, but <a class="issue-link" title="Update 
space quota when a UC block is completed rather than committed." 
href="https://issues.apache.org/jira/browse/HDFS-10843"; 
data-issue-key="HDFS-10843"><del>HDFS-10843</del></a> fixed that bug in another 
way.</p>
   <p>The num of bytes of the committed block will no longer change, so we 
should update the quota usage when the block is committed, which can reduce the 
delta quota usage in time.</p>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDFS-17497. The number of bytes of the last committed block should be calculated into the file length [hadoop]

Reply via email to