[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876699#action_12876699
]
Li Chongxin commented on HBASE-50:
----------------------------------
bq. ... but also after snapshot is done.... your design should include
description of how files are archived, rather than deleted...
Are you talking about files that are no longer used by hbase table but are
referenced by snapshot? I think this has been described in chapter 6 'Snapshot
Maintenance'. For example, hfiles are archived in delete directory. And section
6.4 describes how these files will be cleaned up.
bq. ..In fact you'll probably be doing a snapshot of at least a subset of
.META. on every table snapshot I'd imagine - at least the entries for the
relevant table.
.META. entries for the snapshot table have been dumped, haven't they? Why we
still need a snapshot of a subset of .META.?
bq. So, do you foresee your restore-from-snapshot running split over the logs
as part of the restore? That makes sense to me.
Yes, restore-from-snapshot has to run split over the WAL logs. It will take
some time. So restore-from-snapshot will not be very fast.
bq. Why you think we need a Reference to the hfile? Why not just a file that
lists the names of all the hfiles? We don't need to execute the snapshot, do
we? Restoring from a snapshot would be a bunch of file renames and wal
splitting?
At first I thought snapshot probably should keep the table directory structure
for the later use. For example, a reader like HalfStoreFileReader could be
provided so that we could read from the snapshot directly. But yes, we actually
don't execute the snapshot. So keeping a list of all the hfiles (actually one
list per RS, right?) should be enough. And also restroing from snapshot is not
just file renames. Since a hfile might be referenced by several snapshot, we
should probably do real copy when restroing, right?
bq. Shall we name the new .META. column family snapshot rather than reference?
sure
bq. On the filename '.deleted', I think it a mistake to give it a '.' prefix
especially given its in the snapshot dir...
Ok, I will rename the snapshot dir as '.snapshot'. For dir '.deleted', what
name do you think we should use? Because there might be several snapshots under
the dir '.snapshot', each has a snapshot name, I name this dir as '.deleted' to
discriminate it from a snapshot name.
bq. Do you need a new catalog table called snapshots to keep list of snapshots,
of what a snapshot comprises and some other metadata such as when it was made,
whether it succeeded, who did it and why?
It'll be much more convenient if a catalog table 'snapshot' can be created.
Will this impact normal operation of hbase?
bq. Section 7.4 is missing split of WAL files. Perhaps this can be done in a MR
job?
I'll add the split of WAL logs. Yes, a MR job can be used. Which method do you
think is better? Read from the imported file and inserted into the table by
hbase api. Or just copy the hfile into place and update the .META.?
bq. Lets not have the master run the snapshot... let the client run it?
bq. Snapshot will be doing same thing whether table is partially online or not..
I put these two issues together because I think they are correlative. In
current design, if a table is opened, snapshot will be performed by each RS
which serves tha table regions. Otherwise, if a table is closed, snapshot will
be performed by the master because the table is not served by any RS. For the
first comment, it is talking about closed table. So master will perform the
snapshot because client does not have access to underlying dfs. For the second
one, I was thinking if a table is partially online, table regions might be
partially served by RS and partially offline, right? Then who will perform the
snapshot? If RS, the regions that are offline will be missed. If the master,
regions that are online might lose data in memstore. I'm confused..
bq. It's a synchronous way. Do you think this is appropriate? Yes. I'm w/ JG on
this.
This is another problem confusing me..In current design (which is a synchronous
way), a snapshot is started when all the RS are ready for snapshot. Then all RS
perform snapshot concurrently. This guarantees snapshot is not started if one
RS fails. If we switch to an asynchronous approach. Should the RS start
snapshot immediately when it is ready?
> Snapshot of table
> -----------------
>
> Key: HBASE-50
> URL: https://issues.apache.org/jira/browse/HBASE-50
> Project: HBase
> Issue Type: New Feature
> Reporter: Billy Pearson
> Assignee: Li Chongxin
> Priority: Minor
> Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip
>
>
> Havening an option to take a snapshot of a table would be vary useful in
> production.
> What I would like to see this option do is do a merge of all the data into
> one or more files stored in the same folder on the dfs. This way we could
> save data in case of a software bug in hadoop or user code.
> The other advantage would be to be able to export a table to multi locations.
> Say I had a read_only table that must be online. I could take a snapshot of
> it when needed and export it to a separate data center and have it loaded
> there and then i would have it online at multi data centers for load
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect
> from failed servers, but this does not protect use from software bugs that
> might delete or alter data in ways we did not plan. We should have a way we
> can roll back a dataset.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.