[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897136#action_12897136
]
HBase Review Board commented on HBASE-50:
-----------------------------------------
Message from: [email protected]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review823
-----------------------------------------------------------
I reviewed the first page of diffs. Will do second page tomorrow (This is a
lovely big patch Li -- so far, so good... keep up the good work).
src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
<http://review.cloudera.org/r/467/#comment2750>
This is fine for an hbase that is a fresh install but what about case where
the data has been migrated from an older hbase version; it won't have this
column family in .META. We should make a little migration script that adds it
or on start of new version, check for it and if not present, create it.
src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
<http://review.cloudera.org/r/467/#comment2751>
ditto
src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java
<http://review.cloudera.org/r/467/#comment2752>
Maybe exception should be called TablePartiallyOpenException (Not
important. Minor).
src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
<http://review.cloudera.org/r/467/#comment2753>
Can the snapshot name be empty and then we'll make one up?
src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
<http://review.cloudera.org/r/467/#comment2754>
Sure, why not. Its stored in the filesystem? If so, for sure use same rule.
src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
<http://review.cloudera.org/r/467/#comment2755>
For restore of the snapshot, do you use loadtable.rb or Todd's new
bulkloading scripts?
src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
<http://review.cloudera.org/r/467/#comment2756>
This looks like an improvement.
src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2757>
Whats this? A different kind of reference?
src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2758>
Do you need to specify the ordinals? Wont they be these numbers anyways?
Or is it in case someone adds a new type of reference?
src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2759>
Why not just use the ordinal?
http://download-llnw.oracle.com/javase/1.5.0/docs/api/java/lang/Enum.html#ordinal()
And serialize the int?
src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2760>
Hmm... is this good? You are dropping some the region name when you
toString. Do we have to?
src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2765>
This could be a problem when fellas migrate old data to use this new hbase.
If there are References in the old data, then this deserialization will fail?
I'm fine w/ you creating a new issue named something like "Migration from
0.20.x hbase to 0.90" and adding a note in there that we need to consider this
little issue. Better though would be if the data was able to migrate itself at
runtime; i.e. recognize a boolean on the stream and then deserialize the old
style into the new, etc.
src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java
<http://review.cloudera.org/r/467/#comment2767>
Good.
src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
<http://review.cloudera.org/r/467/#comment2768>
Is this comment right?
src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java
<http://review.cloudera.org/r/467/#comment2770>
Why Random negative number? Why not just leave it blank?
src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java
<http://review.cloudera.org/r/467/#comment2775>
It'd be cool breaking up these methods so they were static if possible so
you could unit test them to ensure they do as advertised.
src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java
<http://review.cloudera.org/r/467/#comment2776>
Check the result. It may not have worked. If it didn't deserves a WARN at
least.
src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2777>
You might want to check the returns from these methods.
src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2778>
Don't bother warning I'd say.. just throw
src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2779>
Yeah, just throw... that'll show in logs anyway (I believe)
src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2780>
I'm with Ted on this one. Need to do something about it else it'll annoy
till the end of time.
src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java
<http://review.cloudera.org/r/467/#comment2781>
This looks like it should be a general utility method (Not important)
src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java
<http://review.cloudera.org/r/467/#comment2782>
If table were big, this could be prohibitively expensive? A
single-threaded copy of all of a tables data? We could compliment this with
MR-base restore, something that did the copy using MR?
src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java
<http://review.cloudera.org/r/467/#comment2784>
This looks like a class that you could write a unit test for?
- stack
> Snapshot of table
> -----------------
>
> Key: HBASE-50
> URL: https://issues.apache.org/jira/browse/HBASE-50
> Project: HBase
> Issue Type: New Feature
> Reporter: Billy Pearson
> Assignee: Li Chongxin
> Priority: Minor
> Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot
> Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class
> Diagram.png
>
>
> Havening an option to take a snapshot of a table would be vary useful in
> production.
> What I would like to see this option do is do a merge of all the data into
> one or more files stored in the same folder on the dfs. This way we could
> save data in case of a software bug in hadoop or user code.
> The other advantage would be to be able to export a table to multi locations.
> Say I had a read_only table that must be online. I could take a snapshot of
> it when needed and export it to a separate data center and have it loaded
> there and then i would have it online at multi data centers for load
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect
> from failed servers, but this does not protect use from software bugs that
> might delete or alter data in ways we did not plan. We should have a way we
> can roll back a dataset.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.