[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897382#action_12897382
]
HBase Review Board commented on HBASE-50:
-----------------------------------------
Message from: [email protected]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review840
-----------------------------------------------------------
This is looking really great Li. If you can address the comments below in a
new version, lets get this patch committed.
Unfortunately, the master rewrite is going to change a bunch of the stuff that
this patch depends on. For example, BaseScanner is going away. But, thats not
your issue.
What about scaling? The only problematic issue I saw was copy of storefiles
on restore. We should file an issue to do that via a MR job.
src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java
<http://review.cloudera.org/r/467/#comment2820>
final
src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java
<http://review.cloudera.org/r/467/#comment2821>
Why let this out?
src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java
<http://review.cloudera.org/r/467/#comment2822>
Want to remove this or enable the assertion? One or the other I'd say
rather than this.
src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java
<http://review.cloudera.org/r/467/#comment2823>
This class looks good.
src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java
<http://review.cloudera.org/r/467/#comment2824>
Its a pity this class is named so. We're about to bring in a new patch
that redoes the zk stuff -- breaks it up into pieces each with a singular
purpose; e.g. tracking root location or tracking meta region server -- and
unfortunately the pattern is to name these purposed classes *Tracker. There'll
be a clash of this kinda Tracker and the new zk Trackers. Not important, just
saying in case you have another name in mind for this class.
src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java
<http://review.cloudera.org/r/467/#comment2825>
Can you make this a configuration? int maxRetries =
Configuration.getInt("hbase.snapshot.retries", 3); or something? It doesn't
have to go into hbase-default.xml
src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java
<http://review.cloudera.org/r/467/#comment2826>
Yeah, can the max retries be made into a data member rather than hardcoded
everywhere in this class?
src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java
<http://review.cloudera.org/r/467/#comment2827>
Looks like you need more explaination here? This is a special case right
where the table is offline and we're asked to make a snapshot of it?
src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java
<http://review.cloudera.org/r/467/#comment2828>
Its not a backup, its creating references, right?
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/467/#comment2829>
This looks like an improvement, making it static so can be used more
generally.
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/467/#comment2830>
And flushing is disabled at this point too, right? Compactions? (Good).
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/467/#comment2831>
If snapshot fails, do we have to do cleanup?
src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.cloudera.org/r/467/#comment2832>
Good
src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.cloudera.org/r/467/#comment2833>
This looks good.
src/main/java/org/apache/hadoop/hbase/regionserver/SnapshotThread.java
<http://review.cloudera.org/r/467/#comment2834>
Call this Snapshotter instead?
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.cloudera.org/r/467/#comment2835>
Do we have to do this down at the Store level? Coud we move it up to
Region or up to the RegionServer itself? It already has an HTable instance.
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<http://review.cloudera.org/r/467/#comment2836>
Why you have to pass the reference? It wasn't needed previously?
src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
<http://review.cloudera.org/r/467/#comment2837>
Does this stuff belong in here in this general utility class? Should it be
polluted with References? Should this stuff be over in io package where the
Reference is or static methods on Reference?
src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java
<http://review.cloudera.org/r/467/#comment2838>
What about a test of restore from snapshot? Is there one? I dont' see it?
src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java
<http://review.cloudera.org/r/467/#comment2839>
Good test
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java
<http://review.cloudera.org/r/467/#comment2840>
There is getTestDir(final String subdirName) FYI
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java
<http://review.cloudera.org/r/467/#comment2841>
Looks good.
src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java
<http://review.cloudera.org/r/467/#comment2842>
Great
- stack
> Snapshot of table
> -----------------
>
> Key: HBASE-50
> URL: https://issues.apache.org/jira/browse/HBASE-50
> Project: HBase
> Issue Type: New Feature
> Reporter: Billy Pearson
> Assignee: Li Chongxin
> Priority: Minor
> Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot
> Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class
> Diagram.png
>
>
> Havening an option to take a snapshot of a table would be vary useful in
> production.
> What I would like to see this option do is do a merge of all the data into
> one or more files stored in the same folder on the dfs. This way we could
> save data in case of a software bug in hadoop or user code.
> The other advantage would be to be able to export a table to multi locations.
> Say I had a read_only table that must be online. I could take a snapshot of
> it when needed and export it to a separate data center and have it loaded
> there and then i would have it online at multi data centers for load
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect
> from failed servers, but this does not protect use from software bugs that
> might delete or alter data in ways we did not plan. We should have a way we
> can roll back a dataset.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.