[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897250#action_12897250
]
HBase Review Board commented on HBASE-50:
-----------------------------------------
Message from: "Chongxin Li" <[email protected]>
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java, line 673
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6002#file6002line673>
bq. >
bq. > This is fine for an hbase that is a fresh install but what about
case where the data has been migrated from an older hbase version; it won't
have this column family in .META. We should make a little migration script
that adds it or on start of new version, check for it and if not present,
create it.
That's right. But AddColumn operation requires the table disabled to proceed,
ROOT table can not be disabled once the system is started. Then how could we
execute the migration script or check and create it on start of new version?
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 899
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line899>
bq. >
bq. > Can the snapshot name be empty and then we'll make one up?
a default snapshot name? or a auto-generated snapshot name, such as creation
time?
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 951
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line951>
bq. >
bq. > For restore of the snapshot, do you use loadtable.rb or Todd's new
bulkloading scripts?
Currently, NO...
Snapshot is composed of a list of log files and a bunch of reference files for
HFiles of the table. These reference files have the same hierarchy as the
original table and the name is in the format of "1239384747630.tablename",
where the front is the file name of the referred HFile and the latter is table
name for snapshot. Thus to restore a snapshot, just copy reference files (which
are just a few bytes) to the table dir, update the META and split the logs.
When this table is enabled, the system know how to replay the commit edits and
read such a reference file. Methods getReferredToFile, open in StoreFile are
updated to deal with this kind of reference files for snapshots.
At present, snapshot can only be restored to the table whose name is the same
as the one for which the snapshot is created. That the old table with the same
name must be deleted before restore a snapshot. That's what I do in unit test
TestAdmin. Restoring snapshot to a different table name has a low priority. It
has not been implemented yet.
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 50
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line50>
bq. >
bq. > Whats this? A different kind of reference?
Yes.. This is the reference file in snapshot. It references an HFile of the
original table.
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java,
line 115
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6018#file6018line115>
bq. >
bq. > This looks like a class that you could write a unit test for?
Sure, I'll add another case in TestLogsCleaner.
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java, line
130
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6017#file6017line130>
bq. >
bq. > If table were big, this could be prohibitively expensive? A
single-threaded copy of all of a tables data? We could compliment this with
MR-base restore, something that did the copy using MR?
This method is only used in RestoreSnapshot, where reference files of snapshot
are copied to the table dir. These reference files just contains a few bytes
instead of the table's data. Snapshots share the table data with the original
table and other snapshots. Do we still need a MR job?
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java, line 212
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6013#file6013line212>
bq. >
bq. > Why Random negative number? Why not just leave it blank?
If a blank value is used as the key, there would be only one item at last if it
is the first few times to scan the regions. Using random negative number
indicates all these regions have not been scanned before. If it has been
scanned, there would be a last checking time for it instead.
bq. On 2010-08-10 21:34:40, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java,
line 251
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6012#file6012line251>
bq. >
bq. > Is this comment right?
I just renamed the Ranges to caps, comment was not changed.
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 149
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line149>
bq. >
bq. > Hmm... is this good? You are dropping some the region name when you
toString. Do we have to?
This has not been changed. Just rename field "region" to "range"
bq. On 2010-08-10 21:34:40, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 156
bq. > <http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line156>
bq. >
bq. > This could be a problem when fellas migrate old data to use this new
hbase. If there are References in the old data, then this deserialization will
fail? I'm fine w/ you creating a new issue named something like "Migration
from 0.20.x hbase to 0.90" and adding a note in there that we need to consider
this little issue. Better though would be if the data was able to migrate
itself at runtime; i.e. recognize a boolean on the stream and then deserialize
the old style into the new, etc.
Actually I think it is fine to migrate old data to new hbase. Old references
are serialized by DataOutput.writeBoolean(boolean), where value (byte)1 is
written if the argument is true and value (byte)0 is written if argument is
false.
See (from Ted's review):
http://download.oracle.com/javase/1.4.2/docs/api/java/io/DataOutput.html#writeBoolean%28boolean%29
Thus value (byte)1 was written if it is the top file region (isTopFileRegion is
true). That is exactly the same as current value of TOP. For the same reason,
this deserialization would work for the references in the old data, right?
That's why we can not use ordinal of Enum and serialize the int value. The
serialization size of this field would be different for the new data and old
data if int value is used.
- Chongxin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review823
-----------------------------------------------------------
> Snapshot of table
> -----------------
>
> Key: HBASE-50
> URL: https://issues.apache.org/jira/browse/HBASE-50
> Project: HBase
> Issue Type: New Feature
> Reporter: Billy Pearson
> Assignee: Li Chongxin
> Priority: Minor
> Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot
> Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class
> Diagram.png
>
>
> Havening an option to take a snapshot of a table would be vary useful in
> production.
> What I would like to see this option do is do a merge of all the data into
> one or more files stored in the same folder on the dfs. This way we could
> save data in case of a software bug in hadoop or user code.
> The other advantage would be to be able to export a table to multi locations.
> Say I had a read_only table that must be online. I could take a snapshot of
> it when needed and export it to a separate data center and have it loaded
> there and then i would have it online at multi data centers for load
> balancing and failover.
> I understand that hadoop takes the need out of havening backup to protect
> from failed servers, but this does not protect use from software bugs that
> might delete or alter data in ways we did not plan. We should have a way we
> can roll back a dataset.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.