[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273371#comment-16273371
 ] 

Mike Drob commented on HBASE-17852:
-----------------------------------

bq. The reason is the simplicity of the implementation. Is not this obvious?
It's not obvious, hence the need for clarifying questions. We're all 
collaborators here, Vlad, not adversaries. I haven't reviewed the code, so in 
this instance I'm a messenger and attempted mediator.

bq. Should I have spent time trying to implement Tx management instead?
Maybe!

bq. Did I answer original question? I thought that we are technical guys and we 
need technical answers. It seems that I was wrong.
I'm reminded of advice I got early in my software engineer career - it's easy 
to write code, it's less easy to write correct code, and it is actively hard to 
know which code to write.

The technical answer may have been obvious like you assert, but it's not a 
complete answer. Understanding how the operators will need to use this feature 
and how they will interact with it is important in building something that is 
useful to them.

bq. User intervention is required only if user kills backup process or it dies 
on a client side, for some other reason.
There's lots of reasons that a process might die on the client side. Seems we 
may disagree on the frequency here.

bq. Do not we still have hbck for this reason?
Sure, we can extend hbck to take care of these failures as well. Does it 
currently do so? I have no idea. Probably not, given that I don't think hbck 
works with hbase-2.0 due to AMv2.

bq. Moving feature out of beta-1 only because someone does not like attitude of 
a contributor
It seems like the feature is being moved out because it's incomplete...

And some earlier comments:
bq. But for lazy operators...
Lazy operators are the best kind. They are the ones that automate things, the 
ones that prepare and test for failure so that they don't get called in the 
middle of the night, the ones that actually make sure that the ship stays 
sailing.

bq. The patch is no 8 already
I'm not sure what this is intended to prove. Sometimes I get patches right on 
the first try, sometimes it takes twenty tries.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-17852
>                 URL: https://issues.apache.org/jira/browse/HBASE-17852
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to