[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453667#comment-13453667
 ] 

Jesse Yates commented on HDFS-3370:
-----------------------------------

@Jaganar - with HBASE-6055 (currently in review) you get a flush (more or less 
coordinated between regionservers - see the jira for more info) of the memstore 
to HFiles, which we would then _love_ to hardlink into the snapshot directory. 
HFiles live under the the region directory - which lives under the column 
family and table directories -  where the HFile is being served. When a 
comapction occurs, the file is moved to the .archive directory. Currently, we 
are getting around the hardlink issue by referencing the HFiles by name and 
then using a FileLink (also in review) to deal with the file getting archived 
out from under us when we restore the table. 

The current implementation of snapshots in HBase is pretty close to what you 
are proposing (and almost identical for 'globally consistent' - cross-server 
consistent- snapshots, but those quiesce for far too long to ensure 
consistency), but spends minimal time blocking. 

In short, hardlinks make snapshotting easier, but we still need both parts to 
get 'clean' restores. Otherwise, we need to do a WAL replay from the COW 
version of the WAL to get back in-memory state.

Does that make sense/answer your question?
                
> HDFS hardlink
> -------------
>
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to