[
https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508125#comment-13508125
]
Jonathan Hsieh commented on HBASE-7212:
---------------------------------------
Thanks for taking a look. I'll get another rev out with cleaned up
documentation in a day or two. Answers below.
bq. Do you have to write it yourself? Anything already available that you could
use? Say we used the zk curator client, it has a few barriers implemented
already: https://github.com/Netflix/curator/wiki/Recipes If we were using
curator say, could you use these receipes as building blocks so you didn't have
to write this yourself? (This feature has to be backportable to 0.94?)
This is a simplified version of Jesse's patch. I just gave curator a quick it
is similar to the double barrier
(https://github.com/Netflix/curator/wiki/Double-barrier). If it is implemented
as the recipe you pointed out, I think we'd still need to add in the ability
for cancellation/abort to come from any of the members.
bq. Reading the diagram, I"m not sure what receivedreached is. Or sendReached.
sendReached is the coordinator saying all participants responded/are
participating?
Yes -- reached is sent when the coordinator figures out that it has "reached"
the global barrier point because all members have taken their part of the
global barrier.
Basically, zk is being used for its async notifications and as the RPC
mechanism. Arrows into the ZK column are calls writing to ZK, arrows out of ZK
are callbacks being called at the target. So the red coordinator writes to zk
via sendStart, zk node creation triggers a startNewOpearion callback on the the
blue member1, and similarly on the the green member2. These names are short
hand for the names in the review was posted -- now sendStart ->
sendBarrierStart, sendReached -> sendBarrierReached, startNewOperation ->
Subprocedure's consturctor + acquireBarrier, receiveReached ->
receiveReachedGlobalBarrier
bq. On your barrier you say "...but does not support ACID semantics" and thats
ok because the 'transactions' we'll be running over this mechanism do not
require it? Because they can proceed and complete in any order and result will
come off the same?
Previously, this code was called TwoPhaseCommit (2pc). While it had two
phases, the code did not implement true two phase commit. The purpose of this
explicit comparison is to make clear 2pc's purpose (distributed ACID
guarantees), to point out that we don't have 2pc here, to point out that we
don't need 2pc here, and to point out that we just need a global barrier.
The online snapshot coordination does not need all of what 2pc provides. The
first cut will have "only on a sunny day" semantics -- e.g. it will only
succeed if everything succeeds and if anything fails along the way whole
attempt will be aborted. This is ok because the durable work that snapshots
does goes into tmp dir (/hbase/.snapshots/.tmp/xxx) that is "commited" at the
end atomically via HDFS dir rename, and that durable intermediate operation
(e.g. new files from forcing a hlog roll or hlog flush) don't need to be undone
to remain correct.
bq. You say "....Does not recover on failures" ... because the operation just
FAILs. Right?
Yup.
bq. Only one of these precedures can be ongoingn at any one time? Is that right?
True for this first cut implementation, but not a fundamental limitation. This
actually gets enforced at the snapshot manager level which may be visible in
HBASE-7208 and definitely in HBASE-6866 when that gets posted. I believe as
implemented if we picked a different class we could have multiple different
kinds of procedure concurrently running on a different znode dir hierarchy.
bq. How do I read these set of slides? There is a 'Barrier Procedure
Coordination' and then there is 'Procedure Coordination'? So, the PC makes use
of a BPC? BPC is the skeleton you hang PC on?
All those are synonymous -- I've bee using procedure as a shorthand. The code
implements one framework for a globally barriered procedure, and I've just
tried to call it 'procedure' and 'subprocedure' everywhere (though from review
I missed spots where it was called task, operation, or commit). This
'procedure' takes care of the global barrier coordination and cross process
error propagation.
bq. Why you say this 'If we aren’t doing proper 2PC do we need all this
infrastructure?'? Are you making a case for our not needing 2PC given what is
being implemented?
I could probably remove that line -- I'm now convinced why we need what this
code does.
The main questions I had when I was initially understanding the previous
implementation was "Is this 2pc?" and "Do we need 2pc?". The answers are: what
we have implemented here has two phases but is *not* true two-phase commit.
2pc, as defined in the literature
(http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf), requires that once the
coordinator says something is committed, any failures at a member or
coordinator must be recover by failing forward and completing it. The key
point here is that while we will need a global barrier for one of the snapshot
flavors (global), it don't need full 2PC because 1) the we don't need to undo
work (like a log roll or flush) if some sub part of the first phase (our
acquire/2pc's prepare) fails, and because 2) we don't need to recover failing
forward if anything fails in the second phase (our release/2pc's commit). In
the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and
carry on with extra flushed/rolled hlogs.
bq. Coordinator can be any client? Does not have to be master?
It could be anywhere, but currently for snapshots the coordinator lives on the
master.
bq. What is ProcedureCoordinateComms?
This is actually a layer that separate the zk code (the rpc communications or
comms code) from specific execution (snapshotting specific code). I could
probably remove it, but the abstraction allows for testing the core pieces
without zk.
bq. Does this barrier acquistion have any relation to zk barrier receipe?
http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_eventHandles
Yes. It is very similar to the double barrier. The main thing different here
is this code allows for any member or coordinator to abort/cancel the whole
shebang while the recipe doesn't seem to. From the recipe it seems that we
could be a little bit more clever about how we use our znodes. (we might have
one extra set).
bq. What is 'class' in the zk node hierarchy? Class of procedure?
The online-snapshots is a 'class' (e.g. all online snapshots) while a procedure
name is an actual name for a particular snapshotting request (snapshot121201,
snapshot121202 etc). Off the top of my head I can't think of any other HBase
processes that are ok with the procedure mechanism's semantics (other
operations like enabling, disabling, schema change, splitting, merging probably
want 2pc and its recovery requirements). I think this extra znode dir could
probably get removed.
> Globally Barriered Procedure mechanism
> --------------------------------------
>
> Key: HBASE-7212
> URL: https://issues.apache.org/jira/browse/HBASE-7212
> Project: HBase
> Issue Type: Sub-task
> Components: snapshots
> Affects Versions: hbase-6055
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
> Fix For: hbase-6055
>
> Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch,
> pre-hbase-7212.patch
>
>
> This is a simplified version of what was proposed in HBASE-6573. Instead of
> claiming to be a 2pc or 3pc implementation (which implies logging at each
> actor, and recovery operations) this is just provides a best effort global
> barrier mechanism called a Procedure.
> Users need only to implement a methods to acquireBarrier, to act when
> insideBarrier, and to releaseBarrier that use the ExternalException
> cooperative error checking mechanism.
> Globally consistent snapshots require the ability to quiesce writes to a set
> of region servers before a the snapshot operation is executed. Also if any
> node fails, it needs to be able to notify them so that they abort.
> The first cut of other online snapshots don't need the fully barrier but may
> still use this for its error propagation mechanisms.
> This version removes the extra layer incurred in the previous implementation
> due to the use of generics, separates the coordinator and members, and
> reduces the amount of inheritance used in favor of composition.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira