[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

stack (JIRA) Mon, 12 Sep 2016 11:18:32 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484865#comment-15484865
 ]


stack commented on HBASE-7912:
------------------------------

I reviewed the doc. It is nice and high-level pitched properly at a 'user'. It 
is in msword format. Can we do it up in refguide format and as a patch on rb? 
There are some minor issues that could be better addressed via comments up on 
rb. Looks like backup is well-worthy of its own, dedicated chapter.

Nit: On usage, the backupid exists nowhere in the system except as output from 
the backup command? Later it is explained what it is which is helpful. Also, 
later again it is explained that it does live as part of the backup history. 
Could be good to call out these facts earlier.

Could also say how long a backup is going to take roughly. Me as an operator 
would be afraid to run a backup because I'd think the command could run for 
ever on my 100 node cluster. Reading later in the manual, it seems that it 
could start and return immediately and then I check in on status. Would be good 
to surface some of that up here at start of doc. to allay the fears of the poor 
operator.

I tried running this feature by checkout the branch. I built and started it up. 
In logs I see:

2016-09-12 09:31:35,927 ERROR [ProcedureExecutor-3] master.TableStateManager: 
Unable to get table hbase:backup state
org.apache.hadoop.hbase.TableNotFoundException: hbase:backup
        at 
org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:174)
        at 
org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:131)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.isDisabledorDisablingRegionInRIT(AssignmentManager.java:1221)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:739)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1567)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1546)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils.assignRegions(ModifyRegionUtils.java:254)
        at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.assignRegions(CreateTableProcedure.java:430)
        at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:127)
        at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:57)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:452)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1066)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:855)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:808)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:494)

... a few times. That bad?

A restart seemed to come up w/o this issue.

Poking at the command-line, the usage is nice but a bit wonky in the listing... 
could do w/ some cleanup:

{code}
kalashnikov:hbase.git.commit2 stack$ ./bin/hbase backup create
ERROR: wrong number of arguments
Usage: hbase backup create <type> <BACKUP_ROOT> [tables] [-s name] [-convert] 
[-silent] [-w workers][-b bandwith]
 type          "full" to create a full backup image;
               "incremental" to create an incremental backup image
  BACKUP_ROOT   The full root path to store the backup image,
                    the prefix can be hdfs, webhdfs or gpfs
 Options:
  tables      If no tables ("") are specified, all tables are backed up. 
Otherwise it is a
               comma separated list of tables.
 -w          number of parallel workers.
 -b          bandwith per one worker (in MB sec)
 -set        name of backup set
{code}

Convert is unexplained as is silent (I can guess what the latter means)

I would have liked file: scheme as an option for backup location if only for 
testing purposes (the timelinev2 folks might like this too....). I can file an 
issue.

-w workers are threads or mapreduce tasks? Thats what I asked myself when I saw 
it.

Would be great working the doc explaination of each of these options back into 
the command usage. More folks will read the cmd output than doc 
(unfortunately). e.g. the doc explains what the -w option is about where usage 
output does not.

Does that '-b' for bandwidth actually work? If so, how.

I get different usage when an error:

kalashnikov:hbase.git.commit2 stack$ ./bin/hbase backup create full http://tmp 
-s first
2016-09-12 09:55:01,437 ERROR [main] util.AbstractHBaseTool: Error when parsing 
command-line arguments
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -s
        at org.apache.commons.cli.Parser.processOption(Parser.java:363)
        at org.apache.commons.cli.Parser.parse(Parser.java:199)
        at org.apache.commons.cli.Parser.parse(Parser.java:85)
        at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.parseArgs(AbstractHBaseTool.java:135)
        at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:94)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:133)
usage: bin/hbase org.apache.hadoop.hbase.backup.BackupDriver <options>
Options:
 -all          All tables
 -b <arg>      Bandwidth (MB/s)
 -debug        Enable debug loggings
 -h,--help     Show usage
 -n <arg>      History length
 -path <arg>   Backup destination root directory path
 -set <arg>    Backup set name
 -t <arg>      Table name
 -w <arg>      Number of workers

Should it be the same not to confuse (I was thinking I'd run the wrong tool)?

So, I'm passing -s but it looks like I should pass -set. Why some options 
single letter with the hyphen but then others are words (and they don't do the 
unix-y thing of requiring double hyphen?)

When I ran the command, I got:

{code}
kalashnikov:hbase.git.commit2 stack$ ./bin/hbase backup create full file:///tmp 
-set first
2016-09-12 09:57:47,923 WARN  [main] util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2016-09-12 09:57:48,829 ERROR [main] util.AbstractHBaseTool: Error running 
command-line tool
java.io.IOException: Backup set 'first' is either empty or does not exist
        at 
org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:188)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:113)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:128)
        at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:133)
{code}

I'd guess it is the scheme that is disallowed but the complaint is that 'first' 
does not exist.

Rereading the doc., I got sense that I had to first add my 'set' before I could 
refer to it so I tried following and got below output:

kalashnikov:hbase.git.commit2 stack$ ./bin/hbase backup set
2016-09-12 10:02:27,178 ERROR [main] util.AbstractHBaseTool: Error running 
command-line tool
java.io.IOException: command line format
        at 
org.apache.hadoop.hbase.backup.impl.BackupCommands$BackupSetCommand.execute(BackupCommands.java:469)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:113)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:128)
        at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:133)

Then I thought it was just that there no help on 'set'... so:

{code}
kalashnikov:hbase.git.commit2 stack$ ./bin/hbase backup set add first
2016-09-12 10:04:07,068 ERROR [main] util.AbstractHBaseTool: Error running 
command-line tool
java.lang.RuntimeException: Wrong args
        at 
org.apache.hadoop.hbase.backup.impl.BackupCommands$BackupSetCommand.processSetAdd(BackupCommands.java:559)
        at 
org.apache.hadoop.hbase.backup.impl.BackupCommands$BackupSetCommand.execute(BackupCommands.java:477)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:113)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:128)
        at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at 
org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:133)
{code}

I see what I did wrong now but I'd think that I'd get some usage on the set 
command...

Hm... I tried '-h' and that seems to work. Suggest it more user friendly if no 
args dumps help.... but no, the usage is for another command than set:

{code}
$ ./bin/hbase backup set -h
usage: bin/hbase org.apache.hadoop.hbase.backup.BackupDriver <options>
Options:
 -all          All tables
 -b <arg>      Bandwidth (MB/s)
 -debug        Enable debug loggings
 -h,--help     Show usage
 -n <arg>      History length
 -path <arg>   Backup destination root directory path
 -set <arg>    Backup set name
 -t <arg>      Table name
 -w <arg>      Number of workers
{code}

I tried history command. It emitted nothing. I add a -h and got the above.

Is 'history' the 'list' of backups taken? They the same thing?

I think these command-line tools need to run smoothly. If they don't, no 
operator is going to trust the rest of the backup/restore tooling chain.

I don't have access to a little cluster just yet. Will be back when I have an 
hdfs to copy backups up to so I can test other commands.

Also, critical to add more detail on how it works and a limitations section as 
suggested by [~mbertozzi] up on the dev mailing list thread.

The 'backup scenario' on the end of the doc is great (the doc in general is 
really good).







> HBase Backup/Restore Based on HBase Snapshot
> --------------------------------------------
>
>                 Key: HBASE-7912
>                 URL: https://issues.apache.org/jira/browse/HBASE-7912
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Richard Ding
>            Assignee: Vladimir Rodionov
>              Labels: backup
>             Fix For: 2.0.0
>
>         Attachments: Backup-and-Restore-Apache_9Sep2016.pdf, 
> HBaseBackupAndRestore - v0.8.pdf, HBaseBackupAndRestore -0.91.pdf, 
> HBaseBackupAndRestore-v0.9.pdf, HBaseBackupAndRestore.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, 
> HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, 
> HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, 
> HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf
>
>
> Finally, we completed the implementation of our backup/restore solution, and 
> would like to share with community through this jira. 
> We are leveraging existing hbase snapshot feature, and provide a general 
> solution to common users. Our full backup is using snapshot to capture 
> metadata locally and using exportsnapshot to move data to another cluster; 
> the incremental backup is using offline-WALplayer to backup HLogs; we also 
> leverage global distribution rolllog and flush to improve performance; other 
> added-on values such as convert, merge, progress report, and CLI commands. So 
> that a common user can backup hbase data without in-depth knowledge of hbase. 
>  Our solution also contains some usability features for enterprise users. 
> The detail design document and CLI command will be attached in this jira. We 
> plan to use 10~12 subtasks to share each of the following features, and 
> document the detail implement in the subtasks: 
> * *Full Backup* : provide local and remote back/restore for a list of tables
> * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental 
> backup)
> * *distributed* Logroll and distributed flush 
> * Backup *Manifest* and history
> * *Incremental* backup: to build on top of full backup as daily/weekly backup 
> * *Convert*  incremental backup WAL files into hfiles
> * *Merge* several backup images into one(like merge weekly into monthly)
> * *add and remove* table to and from Backup image
> * *Cancel* a backup process
> * backup progress *status*
> * full backup based on *existing snapshot*
> *-------------------------------------------------------------------------------------------------------------*
> *Below is the original description, to keep here as the history for the 
> design and discussion back in 2013*
> There have been attempts in the past to come up with a viable HBase 
> backup/restore solution (e.g., HBASE-4618).  Recently, there are many 
> advancements and new features in HBase, for example, FileLink, Snapshot, and 
> Distributed Barrier Procedure. This is a proposal for a backup/restore 
> solution that utilizes these new features to achieve better performance and 
> consistency. 
>  
> A common practice of backup and restore in database is to first take full 
> baseline backup, and then periodically take incremental backup that capture 
> the changes since the full baseline backup. HBase cluster can store massive 
> amount data.  Combination of full backups with incremental backups has 
> tremendous benefit for HBase as well.  The following is a typical scenario 
> for full and incremental backup.
> # The user takes a full backup of a table or a set of tables in HBase. 
> # The user schedules periodical incremental backups to capture the changes 
> from the full backup, or from last incremental backup.
> # The user needs to restore table data to a past point of time.
> # The full backup is restored to the table(s) or to different table name(s).  
> Then the incremental backups that are up to the desired point in time are 
> applied on top of the full backup. 
> We would support the following key features and capabilities.
> * Full backup uses HBase snapshot to capture HFiles.
> * Use HBase WALs to capture incremental changes, but we use bulk load of 
> HFiles for fast incremental restore.
> * Support single table or a set of tables, and column family level backup and 
> restore.
> * Restore to different table names.
> * Support adding additional tables or CF to backup set without interruption 
> of incremental backup schedule.
> * Support rollup/combining of incremental backups into longer period and 
> bigger incremental backups.
> * Unified command line interface for all the above.
> The solution will support HBase backup to FileSystem, either on the same 
> cluster or across clusters.  It has the flexibility to support backup to 
> other devices and servers in the future.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

Reply via email to