[
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168334#comment-14168334
]
Konstantin Shvachko commented on HDFS-7056:
-------------------------------------------
Here are some details of the implementation. LMK if it sounds reasonable. I'll
update the design doc accordingly.
When file is in a snapshot and not on a block boundary then a new last block is
created for the truncated file, which will hold the truncated data of the
original file. The original block will remain in the snapshot copy of the file
unchanged.
There are two main parts in implementing this.
# The truncate recovery on a DN will copy a part of the last block replica into
a new block file, instead of just truncating the replica as HDFS-3107 does now.
The truncating logic can be kept as an optimization for the case when file is
not in a snapshot.
# SnapshotCopy of INodeFile should be extended with a list of blocks,
referencing blocks that composed the file when the snapshot was taken. When
there are multiple snapshots of the same file each snapshot copy may have
different lists of blocks, if file have been truncated and appended between the
snapshots. As a prt of this change we will need to adjust logic when deleting a
file and deleting a snapshot because some blocks may or some may not need to be
invalidated. Here is an overview of operations related to the change.
#* There is no change in createSnapshot operation. The snapshot diffs will be
introduced when the files are actually modified.
#* Append is not changing, because the file has the list of blocks, which is
separate from the lists of snapshot copies.
#* File delete should check if a block belongs to a snapshot before sending it
to the invalidates queue. Only the initial prefix of blocks common with the
latest snapshot should be retained in BlocksMap. Therefore removeFile should
find the common prefix of blocks with the latest snapshot and invalidate the
rest of them.
#* Deleting a snapshot copy of a file should check if a block belongs to
another snapshot before sending it to the invalidates queue. It is not
necessary to check all snapshots, only the previous and the next snapshots
should be verified for blocks common with snapshot being deleted. Therefore
removing a snapshot one should find a prefix of blocks of the current snapshot
which are in common with either the previous or the next snapshot. The rest of
the blocks can be invalidated.
#* I would propose to copy the entire list of blocks to the snapshot copy. This
will simplify the implementation. We can optimize this later by storing
references only to the blocks that are different between the current state of
the file and the snapshot.
> Snapshot support for truncate
> -----------------------------
>
> Key: HDFS-7056
> URL: https://issues.apache.org/jira/browse/HDFS-7056
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Konstantin Shvachko
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which
> are in a snapshot. It is desirable to be able to truncate and still keep the
> old file state of the file in the snapshot.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)