ZanderXu opened a new pull request, #6765:
URL: https://github.com/apache/hadoop/pull/6765
<p>One in-writing HDFS file may contains multiple committed blocks, as
follows (assume one file contains three blocks):</p>
<div class="table-wrap">
| Block 1 | Block 2 | Block 3
-- | -- | -- | --
Case 1 | Complete | Commit | UnderConstruction
Case 2 | Complete | Commit | Commit
Case 3 | Commit | Commit | Commit
</div>
<p> </p>
<p>But the logic for committed blocks is mixed when computing file size, it
ignores the bytes of the last committed block and contains the bytes of other
committed blocks.</p>
<pre class="code panel" style="border-width: 1px;"
data-language="code-java"><span class="code-keyword">public</span> <span
class="code-keyword">final</span> <span class="code-object">long</span>
computeFileSize(<span class="code-object">boolean</span> includesLastUcBlock,
<span class="code-object">boolean</span>
usePreferredBlockSize4LastUcBlock) {
<span class="code-keyword">if</span> (blocks.length == 0) {
<span class="code-keyword">return</span> 0;
}
<span class="code-keyword">final</span> <span
class="code-object">int</span> last = blocks.length - 1;
<span class="code-comment">//check <span class="code-keyword">if</span>
the last block is BlockInfoUnderConstruction
</span> BlockInfo lastBlk = blocks[last];
<span class="code-object">long</span> size = lastBlk.getNumBytes();
<span class="code-comment">// the last committed block is not complete, so
it's bytes may be ignored.
</span> <span class="code-keyword">if</span> (!lastBlk.isComplete()) {
<span class="code-keyword">if</span> (!includesLastUcBlock) {
size = 0;
} <span class="code-keyword">else</span> <span
class="code-keyword">if</span> (usePreferredBlockSize4LastUcBlock) {
size = isStriped()?
getPreferredBlockSize() *
((BlockInfoStriped)lastBlk).getDataBlockNum() :
getPreferredBlockSize();
}
}
<span class="code-comment">// The bytes of other committed blocks are
calculated into the file length.
</span> <span class="code-keyword">for</span> (<span
class="code-object">int</span> i = 0; i < last; i++) {
size += blocks[i].getNumBytes();
}
<span class="code-keyword">return</span> size;
} </pre>
<p>The bytes of one committed block will not be changed, so the bytes of the
last committed block should be calculated into the file length too.</p>
<p> </p>
<p>And the logic for committed blocks is mixed too when computing file
length in DFSInputStream. Normally DFSInputStream doesn't get visible length
for committed block regardless of whether the committed block is the last block
or not.</p>
<p> </p>
<p><a class="issue-link" title="Update space quota when a UC block is
completed rather than committed."
href="https://issues.apache.org/jira/browse/HDFS-10843"
data-issue-key="HDFS-10843"><del>HDFS-10843</del></a> noticed one bug which
actually caused by the committed block, but <a class="issue-link" title="Update
space quota when a UC block is completed rather than committed."
href="https://issues.apache.org/jira/browse/HDFS-10843"
data-issue-key="HDFS-10843"><del>HDFS-10843</del></a> fixed that bug in another
way.</p>
<p>The num of bytes of the committed block will no longer change, so we
should update the quota usage when the block is committed, which can reduce the
delta quota usage in time.</p>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]