[
https://issues.apache.org/jira/browse/HADOOP-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565620#action_12565620
]
dhruba borthakur commented on HADOOP-2655:
------------------------------------------
The datanode has a new directory per volume called "detachDir". This directory
is used to do temporary copy-on-write for data blocks that are part of a
snapshot.
When a client writes a block that is linked to a snapshot, it does the
following:
1. Create a copy of the original file into a file in the detachDir.
2. Rename the newly created file in detachDir into the original file. This
breaks the hardlink and creates two copies of the block atomically.
Point 2 works perfectly on Linux platform. The following are some caveats on
Windows platform.
On Windows platform, the rename fails because the target file already exists.
Thus, the code issues a delete followed by a rename. This means that there is a
window of opportunity (on Windows) when the block does not exist in the right
place. If a read request for the block occurs precisely in that window, then
the client will get an exception and will try to read that block from an
alternate location. (When a datanode restarts, it recovers blocks that are
exist in detachDir but do not exist in the original data directory.) I am
proposing that this is an acceptable solution.
> Copy on write for data and metadata files in the presence of snapshots
> ----------------------------------------------------------------------
>
> Key: HADOOP-2655
> URL: https://issues.apache.org/jira/browse/HADOOP-2655
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: dhruba borthakur
>
> If a DFS Client wants to append data to an existing file (appends,
> HADOOP-1700) and a snapshot is present, the Datanoed has to implement some
> form of a copy-on-write for writes to data and meta data files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.