[
https://issues.apache.org/jira/browse/HBASE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409066#comment-13409066
]
stack commented on HBASE-6233:
------------------------------
Looking at the doc. again:
Is there a table dir missing from this: "● /hbase/.snapshots/<snapshot
name>/<region>/<cf>/<hfiles>"?
We have a filter in front of Filesystem now, HFileSystem. We could instrument
'delete' moving file rather than deleting it a snapshot has happened and we
want to keep deleted files around. I thought we could implement link here too
calling through if reflection determines it present and doing whatever the
alternative is when its not there (would be some ugly casting to HFileSystem I
suppose).
Its 1000ft view, I know, but restoring snapshot, won't we have to create the
table directory structure to move the hardlinked hfiles back into place?
On keeping refcount in .META., Enis's suggestion over in HBASE-6205...
bq. When a file is deleted due to a compaction/region deletion we need to move
that file somewhere and update all the references.
bq. Also having lots of file can slow down the .META. operations.
We have to move it? We can't just decrement references? We'd have to undo the
association of files with particular regions -- the layout under
${HBASE.ROOTDIR} would not be as it is now. We'd present a logical view that
was detached from how the hfiles were stored in hdfs.
Other advantages of the refcount in .META. would be no need of moving files
around or of keeping refs in hdfs... as many refs as snapshots.
I think the below will take a good amount of time on a loaded table of
significant size (say ten region cluster with a table with 100 regions per node
with say two column families with say three storefiles each):
{code}
○ Move the hfile to archive
○ Create a symlink to point to the archived file
○ Create a symlink for the snapshot
{code}
Even meta operations on namenode can take a good bit of time.
Restores would be fast (You can symlink a directory? I've not used them).
Reference files does have the advantage you suggest that it'll be easy to move
to hardlinks from symlinks (but again, I see the ops taking a long time, even
if just meta ops -- is it ok that a snapshot takes a good amount of time...
minutes?)
Your doc. is good Matteo.
I think it'll take
> [brainstorm] snapshots: hardlink alternatives
> ---------------------------------------------
>
> Key: HBASE-6233
> URL: https://issues.apache.org/jira/browse/HBASE-6233
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Matteo Bertozzi
> Assignee: Matteo Bertozzi
> Attachments: Restore-Snapshot-Hardlink-alternatives.pdf
>
>
> Discussion ticket around snapshots and hardlink alternatives.
> (See the HDFS-3370 discussion about hardlink and implementation problems)
> (taking for a moment WAL out of the discussion and focusing on hfiles)
> With hardlinks available taking snapshot will be fairly easy:
> * (hfiles are immutable)
> * hardlink to .snapshot/name to take snapshot
> * hardlink from .snapshot/name to restore the snapshot
> * No code change needed (on fs.delete() only one reference is deleted)
> but we don't have hardlinks, what are the alternatives?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira