[jira] Commented: (HBASE-50) Snapshot of table

HBase Review Board (JIRA) Tue, 10 Aug 2010 21:48:53 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897136#action_12897136
 ]


HBase Review Board commented on HBASE-50:
-----------------------------------------

Message from: [email protected]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review823
-----------------------------------------------------------


I reviewed the first page of diffs.  Will do second page tomorrow (This is a 
lovely big patch Li -- so far, so good... keep up the good work).


src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
<http://review.cloudera.org/r/467/#comment2750>

    This is fine for an hbase that is a fresh install but what about case where 
the data has been migrated from an older hbase version; it won't have this 
column family in .META.  We should make a little migration script that adds it 
or on start of new version, check for it and if not present, create it.



src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
<http://review.cloudera.org/r/467/#comment2751>

    ditto
    



src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java
<http://review.cloudera.org/r/467/#comment2752>

    Maybe exception should be called TablePartiallyOpenException (Not 
important.  Minor).



src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
<http://review.cloudera.org/r/467/#comment2753>

    Can the snapshot name be empty and then we'll make one up?



src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
<http://review.cloudera.org/r/467/#comment2754>

    Sure, why not. Its stored in the filesystem?  If so, for sure use same rule.



src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
<http://review.cloudera.org/r/467/#comment2755>

    For restore of the snapshot, do you use loadtable.rb or Todd's new 
bulkloading scripts?



src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
<http://review.cloudera.org/r/467/#comment2756>

    This looks like an improvement.



src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2757>

    Whats this?  A different kind of reference?



src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2758>

    Do you need to specify the ordinals?  Wont they be these numbers anyways? 
Or is it in case someone adds a new type of reference?



src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2759>

    Why not just use the ordinal?  
http://download-llnw.oracle.com/javase/1.5.0/docs/api/java/lang/Enum.html#ordinal()
  And serialize the int?



src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2760>

    Hmm... is this good?  You are dropping some the region name when you 
toString.  Do we have to?



src/main/java/org/apache/hadoop/hbase/io/Reference.java
<http://review.cloudera.org/r/467/#comment2765>

    This could be a problem when fellas migrate old data to use this new hbase. 
 If there are References in the old data, then this deserialization will fail?  
I'm fine w/ you creating a new issue named something like "Migration from 
0.20.x hbase to 0.90" and adding a note in there that we need to consider this 
little issue.  Better though would be if the data was able to migrate itself at 
runtime; i.e. recognize a boolean on the stream and then deserialize the old 
style into the new, etc.



src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java
<http://review.cloudera.org/r/467/#comment2767>

    Good.



src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
<http://review.cloudera.org/r/467/#comment2768>

    Is this comment right?



src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java
<http://review.cloudera.org/r/467/#comment2770>

    Why Random negative number?  Why not just leave it blank?



src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java
<http://review.cloudera.org/r/467/#comment2775>

    It'd be cool breaking up these methods so they were static if possible so 
you could unit test them to ensure they do as advertised.



src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java
<http://review.cloudera.org/r/467/#comment2776>

    Check the result.  It may not have worked.  If it didn't deserves a WARN at 
least.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2777>

    You might want to check the returns from these methods.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2778>

    Don't bother warning I'd say.. just throw



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2779>

    Yeah, just throw... that'll show in logs anyway (I believe)



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/467/#comment2780>

    I'm with Ted on this one. Need to do something about it else it'll annoy 
till the end of time.



src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java
<http://review.cloudera.org/r/467/#comment2781>

    This looks like it should be a general utility method (Not important)



src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java
<http://review.cloudera.org/r/467/#comment2782>

    If table were big, this could be prohibitively expensive?  A 
single-threaded copy of all of a tables data?  We could compliment this with 
MR-base restore, something that did the copy using MR?



src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java
<http://review.cloudera.org/r/467/#comment2784>

    This looks like a class that you could write a unit test for?


- stack





> Snapshot of table
> -----------------
>
>                 Key: HBASE-50
>                 URL: https://issues.apache.org/jira/browse/HBASE-50
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Billy Pearson
>            Assignee: Li Chongxin
>            Priority: Minor
>         Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
> Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
> Diagram.png
>
>
> Havening an option to take a snapshot of a table would be vary useful in 
> production.
> What I would like to see this option do is do a merge of all the data into 
> one or more files stored in the same folder on the dfs. This way we could 
> save data in case of a software bug in hadoop or user code. 
> The other advantage would be to be able to export a table to multi locations. 
> Say I had a read_only table that must be online. I could take a snapshot of 
> it when needed and export it to a separate data center and have it loaded 
> there and then i would have it online at multi data centers for load 
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect 
> from failed servers, but this does not protect use from software bugs that 
> might delete or alter data in ways we did not plan. We should have a way we 
> can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-50) Snapshot of table

Reply via email to