[
https://issues.apache.org/jira/browse/HDFS-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787284#comment-13787284
]
Chris Nauroth commented on HDFS-5303:
-------------------------------------
bq. I don't think this is a problem.
I agree, mostly because after thinking about this, I can't come up with a
reasonable implementation for maintaining state outside the snapshot root while
still retaining the current performance characteristics of snapshots.
# To implement copy-on-write on symlink targets outside the snapshot root, we'd
need an O\(n\) scan (n = # inodes) to find if there is a snapshot anywhere
containing a symlink pointing to that inode.
# We could optimize the above by storing back-references from the target inode
to the symlink inode, but then we'd cause an O\(n\) increase in storage cost.
# If we choose to reject creation of snapshots containing a symlink outside the
snapshot root, then we need to scan the subtree under the snapshot root, so
snapshot creation is no longer O\(1\). (Again, we could throw storage at this
problem to optimize, but it seems wasteful.)
bq. Every other snapshotting filesystem (btrfs, ZFS, etc) allows snapshots to
contain symlinks to paths outside the snapshot.
I think there is a greater chance for user confusion with symlinks in HDFS
snapshots (can be created for any arbitrary subtree) compared to other
snapshotting file systems (can be created at the volume). With a whole-volume
snapshot, I believe the user still can assume immutability as long as the
symlinks don't jump out of the volume. (Full disclosure: I'm making some
assumptions here. I don't have much real experience using ZFS.)
At this point, I propose that the scope of this jira is to update the snapshot
documentation to warn end users about the pitfalls of combining snapshots and
symlinks. (No code changes.)
> A symlink within a snapshot pointing to a target outside the snapshot root
> can cause the snapshot contents to appear to change.
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-5303
> URL: https://issues.apache.org/jira/browse/HDFS-5303
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: documentation
> Reporter: Chris Nauroth
>
> A snapshot is supposed to represent the point-in-time state of a directory.
> However, if the directory contains a symlink that targets a file outside the
> snapshot root, then the snapshot contents will appear to change if someone
> changes the target file (i.e. delete or append).
--
This message was sent by Atlassian JIRA
(v6.1#6144)