ZanderXu created HDFS-17497:
-------------------------------
Summary: Logic for committed blocks is mixed when computing file
size
Key: HDFS-17497
URL: https://issues.apache.org/jira/browse/HDFS-17497
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: ZanderXu
One in-writing HDFS file may contains multiple committed blocks, as follows
(assume one file contains three blocks):
|| ||Block 1||Block 2||Block 3||
|Case 1|Complete|Commit|UnderConstruction|
|Case 2|Complete|Commit|Commit|
|Case 3|Commit|Commit|Commit|
But the logic for committed blocks is mixed when computing file size, it
ignores the bytes of the last committed block and contains the bytes of other
committed blocks.
{code:java}
public final long computeFileSize(boolean includesLastUcBlock,
boolean usePreferredBlockSize4LastUcBlock) {
if (blocks.length == 0) {
return 0;
}
final int last = blocks.length - 1;
//check if the last block is BlockInfoUnderConstruction
BlockInfo lastBlk = blocks[last];
long size = lastBlk.getNumBytes();
// the last committed block is not complete, so it's bytes may be ignored.
if (!lastBlk.isComplete()) {
if (!includesLastUcBlock) {
size = 0;
} else if (usePreferredBlockSize4LastUcBlock) {
size = isStriped()?
getPreferredBlockSize() *
((BlockInfoStriped)lastBlk).getDataBlockNum() :
getPreferredBlockSize();
}
}
// The bytes of other committed blocks are calculated into the file length.
for (int i = 0; i < last; i++) {
size += blocks[i].getNumBytes();
}
return size;
} {code}
The bytes of one committed block will not be changed, so the bytes of the last
committed block should be calculated into the file length too.
And the logic for committed blocks is mixed too when computing file length in
DFSInputStream. Normally DFSInputStream doesn't get visible length for
committed block regardless of whether the committed block is the last block or
not.
HDFS-10843 noticed one bug which actually caused by the committed block, but
HDFS-10843 fixed that bug in another way.
The num of bytes of the committed block will no longer change, so we should
update the quota usage when the block is committed, which can reduce the delta
quota usage in time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]