[
https://issues.apache.org/jira/browse/HBASE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876485#action_12876485
]
Jean-Daniel Cryans commented on HBASE-2238:
-------------------------------------------
I ran into a weird situation today running TestReplication (from HBASE-2223's
latest patch up on rb), the test kills a region server by expiring its session
and then the following happened (almost all at the same time):
# Master lists all hlogs to split (total of 12)
# RS does a log roll
# RS tries to register the new log in ZK for replication and fails because the
session was expired, but the log is already rolled
# RS takes 3 more edits into the new log
# RS cleans 6 logs over 13
# Master fails at splitting the 3rd log it listed, delays the log splitting
process
# Master tries again to split logs, lists 7 of them and is successful
In the end, the master wasn't missing any edits (because log splitting failed
and got the new log the second time) but the slave cluster was missing 3. This
makes me think that the region server should also do a better job at handling
KeeperException.SessionExpiredException because we currently don't do it at
all.
> Review all transitions -- compactions, splits, region opens, log splitting --
> for crash-proofyness and atomicity
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-2238
> URL: https://issues.apache.org/jira/browse/HBASE-2238
> Project: HBase
> Issue Type: Bug
> Reporter: stack
>
> This issue is about reviewing state transitions in hbase to ensure we're
> sufficently hardened against crashes. This issue I see as an umbrella issue
> under which we'd look at compactions, splits, log splits, region opens --
> what else is there? We'd look at each in turn to see how we survive crash at
> any time during the transition. For example, we think compactions idempotent
> but we need to prove it so. Splits are for sure not, not at the moment
> (Witness disabled parents but daughters missing or only one of them
> available).
> Part of this issue would be writing tests that aim to break transitions.
> In light of above, here is recent off-list note from Todd Lipcon (and
> "another"):
> {code}
> I thought a bit more last night about the discussion we were having
> regarding various HBase components doing operations on the HDFS data,
> and ensuring that in various racy scenarios that we don't have two
> region servers or masters overlapping.
> I came to the conclusion that ZK data can't be used to actually have
> effective locks on HDFS directories, since we can never know that we
> still have a ZK lock when we do an operation. Thus the operations
> themselves have to be idempotent, or recoverable in the case of
> multiple nodes trying to do the same thing. Or, we have to use HDFS
> itself as a locking mechanism - this is what we discussed using write
> leases essentially as locks.
> Since I didn't really trust myself, I ran my thoughts by "Another"
> and he concurs (see
> below). Figured this is food for thought for designing HBase data
> management to be completely safe/correct.
> ...
> ---------- Forwarded message ----------
> From: Another <[email protected]>
> Date: Wed, Feb 17, 2010 at 10:50 AM
> Subject: locks
> To: Todd Lipcon <[email protected]>
> Short answer is no, you're right.
> Because HDFS and ZK are partitioned (in the sense that there's no
> communication between them) and there may be an unknown delay between
> acquiring the lock and performing the operation on HDFS you have no
> way of knowing that you still own the lock, like you say.
> If the lock cannot be revoked while you have it (no timeouts) then you
> can atomically check that you still have the lock and do the operation
> on HDFS, because checking is a no-op. Designing a system with no lock
> revocation in the face of failures is an exercise for the reader :)
> The right way is for HDFS and ZK to communicate to construct an atomic
> operation. ZK could give a token to the client which it also gives to
> HDFS, and HDFS uses that token to do admission control. There's
> probably some neat theorem about causality and the impossibility of
> doing distributed locking without a sufficiently strong atomic
> primitive here.
> Another
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.