[
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196655#comment-14196655
]
Jing Zhao commented on HDFS-7056:
---------------------------------
Thanks for working on this, [~shv] and [~zero45]. So far I just went through
the namenode snapshot part (INode, INodeFile, FileDiff, and FileDiffList) and I
will continue reviewing the remaining.
# Looks like {{findLaterSnapshotWithBlocks}} and
{{findEarlierSnapshotWithBlocks}} are always coupled with
{{FileDiff#getBlocks}}. Maybe we can combine them so that we can wrap the logic
like the following code into two methods like findBlocksAfter and
findBlocksBefore?
{code}
+ FileDiff diff = getDiffs().getDiffById(snapshot);
+ BlockInfo[] snapshotBlocks = diff == null ? getBlocks() : diff.getBlocks();
+ if(snapshotBlocks != null)
+ return snapshotBlocks;
+ // Blocks are not in the current snapshot
+ // Find next snapshot with blocks present or return current file blocks
+ diff = getDiffs().findLaterSnapshotWithBlocks(diff.getSnapshotId());
+ snapshotBlocks = (diff == null) ? getBlocks() : diff.getBlocks();
{code}
# Since the same block can be included in different file diffs, we may have
duplicated blocks in the {{collectedBlocks}}. Will this lead to duplicated
records in invalid block list?
{code}
public void destroyAndCollectSnapshotBlocks(
BlocksMapUpdateInfo collectedBlocks) {
for(FileDiff d : asList())
d.destroyAndCollectSnapshotBlocks(collectedBlocks);
}
{code}
# INodeFile#destroyAndCollectBlocks destroys the whole file, including the file
diffs for snapshots. Thus we do not need to call {{collectBlocksAndClear}} and
define a new destroyAndCollectAllBlocks method. Instead, we can simply first
destroy all the blocks belonging to the current file, then check if calling
{{sf.getDiffs().destroyAndCollectSnapshotBlocks}} is necessary.
{code}
+ FileWithSnapshotFeature sf = getFileWithSnapshotFeature();
+ if(sf == null || getDiffs().asList().isEmpty()) {
+ destroyAndCollectAllBlocks(collectedBlocks, removedINodes);
+ return;
+ }
+ sf.getDiffs().destroyAndCollectSnapshotBlocks(collectedBlocks);
{code}
# How do we currently calculate/update quota for a file? I guess we need to
update the quota calculation algorithm for an INodeFile here.
# I guess the semantic of {{findEarlierSnapshotWithBlocks}} is to find the
FileDiff that satisfies: 1) its block list is not null, and 2) its snapshot id
is less than the given {{snapshotId}}. However, if the given {{snapshotId}} is
not {{CURRENT_STATE_ID}}, the current implementation may return a FileDiff
whose snapshot id is >= the given {{snapshotId}} (since {{getDiffById}} may
return a diff with snapshot id greater than the given id).
{code}
public FileDiff findEarlierSnapshotWithBlocks(int snapshotId) {
FileDiff diff = (snapshotId == Snapshot.CURRENT_STATE_ID) ?
getLast() : getDiffById(snapshotId);
BlockInfo[] snapshotBlocks = null;
while(diff != null) {
snapshotBlocks = diff.getBlocks();
if(snapshotBlocks != null)
break;
int p = getPrior(diff.getSnapshotId(), true);
diff = (p == Snapshot.NO_SNAPSHOT_ID) ? null : getDiffById(p);
}
return diff;
}
{code}
# Still for findEarlierSnapshotWithBlocks, because {{getPrior}} currently is a
{{log\(n\)}} operation, the worst time complexity thus can be {{nlog\(n\)}}.
Considering the list of the snapshot diff list is usually not big (we have an
upper limit for the total number of snapshots), we may consider directly doing
a linear scan for the file diff list.
# In INode.java, why do we need the following change?
{code}
public final boolean isInLatestSnapshot(final int latestSnapshotId) {
- if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) {
+ if (latestSnapshotId == Snapshot.CURRENT_STATE_ID ||
+ latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) {
{code}
# Nit: need to add "\{" and "\}" for the while loop according to our current
coding style. Similar for several other places (e.g.,
{{FileDiffList#destroyAndCollectSnapshotBlocks}}).
{code}
+ while(i < currentBlocks.length && i < snapshotBlocks.length &&
+ currentBlocks[i] == snapshotBlocks[i])
+ i++;
+ // Collect the remaining blocks of the file
+ while(i < currentBlocks.length)
+ collectedBlocks.addDeleteBlock(currentBlocks[i++]);
{code}
# Minor: In the following code, instead of calling {{getDiffById}} to search
for the file diff, we can let {{AbstractINodeDiffList#saveSelf2Snapshot}}
return the diff it just finds/creates.
{code}
public void saveSelf2Snapshot(int latestSnapshotId, INodeFile iNodeFile,
INodeFileAttributes snapshotCopy, boolean withBlocks)
throws QuotaExceededException {
super.saveSelf2Snapshot(latestSnapshotId, iNodeFile, snapshotCopy);
if(! withBlocks) return;
final FileDiff diff = getDiffById(latestSnapshotId);
// Store blocks if this is the first update
diff.setBlocks(iNodeFile.getBlocks());
}
{code}
# Nit: also, for the above code, maybe it's more direct to have
{code}
if (withBlocks) {
final FileDiff diff = getDiffById(latestSnapshotId);
// Store blocks if this is the first update
diff.setBlocks(iNodeFile.getBlocks());
}
{code}
> Snapshot support for truncate
> -----------------------------
>
> Key: HDFS-7056
> URL: https://issues.apache.org/jira/browse/HDFS-7056
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Plamen Jeliazkov
> Attachments: HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which
> are in a snapshot. It is desirable to be able to truncate and still keep the
> old file state of the file in the snapshot.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)