[jira] [Commented] (HDFS-7056) Snapshot support for truncate

Jing Zhao (JIRA) Tue, 04 Nov 2014 11:31:34 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196655#comment-14196655
 ]


Jing Zhao commented on HDFS-7056:
---------------------------------

Thanks for working on this, [~shv] and [~zero45]. So far I just went through 
the namenode snapshot part (INode, INodeFile, FileDiff, and FileDiffList) and I 
will continue reviewing the remaining.
# Looks like {{findLaterSnapshotWithBlocks}} and 
{{findEarlierSnapshotWithBlocks}} are always coupled with 
{{FileDiff#getBlocks}}. Maybe we can combine them so that we can wrap the logic 
like the following code into two methods like findBlocksAfter and 
findBlocksBefore?
{code}
+    FileDiff diff = getDiffs().getDiffById(snapshot);
+    BlockInfo[] snapshotBlocks = diff == null ? getBlocks() : diff.getBlocks();
+    if(snapshotBlocks != null)
+      return snapshotBlocks;
+    // Blocks are not in the current snapshot
+    // Find next snapshot with blocks present or return current file blocks
+    diff = getDiffs().findLaterSnapshotWithBlocks(diff.getSnapshotId());
+    snapshotBlocks = (diff == null) ? getBlocks() : diff.getBlocks();
{code}
# Since the same block can be included in different file diffs, we may have 
duplicated blocks in the {{collectedBlocks}}. Will this lead to duplicated 
records in invalid block list?
{code}
  public void destroyAndCollectSnapshotBlocks(
      BlocksMapUpdateInfo collectedBlocks) {
    for(FileDiff d : asList())
      d.destroyAndCollectSnapshotBlocks(collectedBlocks);
  }
{code}
# INodeFile#destroyAndCollectBlocks destroys the whole file, including the file 
diffs for snapshots. Thus we do not need to call {{collectBlocksAndClear}} and 
define a new destroyAndCollectAllBlocks method. Instead, we can simply first 
destroy all the blocks belonging to the current file, then check if calling 
{{sf.getDiffs().destroyAndCollectSnapshotBlocks}} is necessary.
{code}
+    FileWithSnapshotFeature sf = getFileWithSnapshotFeature();
+    if(sf == null || getDiffs().asList().isEmpty()) {
+      destroyAndCollectAllBlocks(collectedBlocks, removedINodes);
+      return;
+    }
+    sf.getDiffs().destroyAndCollectSnapshotBlocks(collectedBlocks);
{code}
# How do we currently calculate/update quota for a file? I guess we need to 
update the quota calculation algorithm for an INodeFile here.
# I guess the semantic of {{findEarlierSnapshotWithBlocks}} is to find the 
FileDiff that satisfies: 1) its block list is not null, and 2) its snapshot id 
is less than the given {{snapshotId}}. However, if the given {{snapshotId}} is 
not {{CURRENT_STATE_ID}}, the current implementation may return a FileDiff 
whose snapshot id is >= the given {{snapshotId}} (since {{getDiffById}} may 
return a diff with snapshot id greater than the given id).
{code}
  public FileDiff findEarlierSnapshotWithBlocks(int snapshotId) {
    FileDiff diff = (snapshotId == Snapshot.CURRENT_STATE_ID) ?
        getLast() : getDiffById(snapshotId);
    BlockInfo[] snapshotBlocks = null;
    while(diff != null) {
      snapshotBlocks = diff.getBlocks();
      if(snapshotBlocks != null)
        break;
      int p = getPrior(diff.getSnapshotId(), true);
      diff = (p == Snapshot.NO_SNAPSHOT_ID) ? null : getDiffById(p);
    }
    return diff;
  }
{code}
# Still for findEarlierSnapshotWithBlocks, because {{getPrior}} currently is a 
{{log\(n\)}} operation, the worst time complexity thus can be {{nlog\(n\)}}. 
Considering the list of the snapshot diff list is usually not big (we have an 
upper limit for the total number of snapshots), we may consider directly doing 
a linear scan for the file diff list.
# In INode.java, why do we need the following change?
{code}
   public final boolean isInLatestSnapshot(final int latestSnapshotId) {
-    if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) {
+    if (latestSnapshotId == Snapshot.CURRENT_STATE_ID ||
+        latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) {
{code}
# Nit: need to add "\{" and "\}" for the while loop according to our current 
coding style. Similar for several other places (e.g., 
{{FileDiffList#destroyAndCollectSnapshotBlocks}}).
{code}
+    while(i < currentBlocks.length && i < snapshotBlocks.length &&
+          currentBlocks[i] == snapshotBlocks[i])
+      i++;
+    // Collect the remaining blocks of the file
+    while(i < currentBlocks.length)
+      collectedBlocks.addDeleteBlock(currentBlocks[i++]);
{code}
# Minor: In the following code, instead of calling {{getDiffById}} to search 
for the file diff, we can let {{AbstractINodeDiffList#saveSelf2Snapshot}} 
return the diff it just finds/creates.
{code}
  public void saveSelf2Snapshot(int latestSnapshotId, INodeFile iNodeFile,
      INodeFileAttributes snapshotCopy, boolean withBlocks)
          throws QuotaExceededException {
    super.saveSelf2Snapshot(latestSnapshotId, iNodeFile, snapshotCopy);
    if(! withBlocks) return;
    final FileDiff diff = getDiffById(latestSnapshotId);
    // Store blocks if this is the first update
    diff.setBlocks(iNodeFile.getBlocks());
  }
{code}
# Nit: also, for the above code, maybe it's more direct to have
{code}
if (withBlocks) {
  final FileDiff diff = getDiffById(latestSnapshotId);
  // Store blocks if this is the first update
  diff.setBlocks(iNodeFile.getBlocks());
}
{code}

> Snapshot support for truncate
> -----------------------------
>
>                 Key: HDFS-7056
>                 URL: https://issues.apache.org/jira/browse/HDFS-7056
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which 
> are in a snapshot. It is desirable to be able to truncate and still keep the 
> old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7056) Snapshot support for truncate

Reply via email to