[ 
https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508128#comment-13508128
 ] 

Andrew Purtell commented on HBASE-7212:
---------------------------------------

bq. The main questions I had when I was initially understanding the previous 
implementation was "Is this 2pc?" and "Do we need 2pc?". The answers are: what 
we have implemented here has two phases but is not true two-phase commit. 2pc, 
as defined in the literature 
(http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf), requires that once the 
coordinator says something is committed, any failures at a member or 
coordinator must be recover by failing forward and completing it. The key point 
here is that while we will need a global barrier for one of the snapshot 
flavors (global), it don't need full 2PC because 1) the we don't need to undo 
work (like a log roll or flush) if some sub part of the first phase (our 
acquire/2pc's prepare) fails, and because 2) we don't need to recover failing 
forward if anything fails in the second phase (our release/2pc's commit). In 
the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and 
carry on with extra flushed/rolled hlogs.

+1 

This makes a good case. I like the "keep it as simple as possible and only do 
as much as we actually need to" approach.

I can see a use for this in security too. We could tighten up the permissions 
cache using a barrier for grant and revoke ops. In other words, replace the 
current ZK watcher based permissions cache "RPC via ZK" with the Procedure 
mechanism that provides much the same, but with the added benefit that we can 
fail the grant or revoke op if one or more RSes fail to ack the update.
                
> Globally Barriered Procedure mechanism
> --------------------------------------
>
>                 Key: HBASE-7212
>                 URL: https://issues.apache.org/jira/browse/HBASE-7212
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: hbase-6055
>
>         Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch, 
> pre-hbase-7212.patch
>
>
> This is a simplified version of what was proposed in HBASE-6573.  Instead of 
> claiming to be a 2pc or 3pc implementation (which implies logging at each 
> actor, and recovery operations) this is just provides a best effort global 
> barrier mechanism called a Procedure.  
> Users need only to implement a methods to acquireBarrier, to act when 
> insideBarrier, and to releaseBarrier that use the ExternalException 
> cooperative error checking mechanism.
> Globally consistent snapshots require the ability to quiesce writes to a set 
> of region servers before a the snapshot operation is executed.  Also if any 
> node fails, it needs to be able to notify them so that they abort.
> The first cut of other online snapshots don't need the fully barrier but may 
> still use this for its error propagation mechanisms.
> This version removes the extra layer incurred in the previous implementation 
> due to the use of generics, separates the coordinator and members, and 
> reduces the amount of inheritance used in favor of composition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to