[ https://issues.apache.org/jira/browse/HBASE-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119262#comment-16119262 ]
Hadoop QA commented on HBASE-18432: ----------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 32s{color} | {color:green} HBASE-14070.HLC passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} HBASE-14070.HLC passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} HBASE-14070.HLC passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} HBASE-14070.HLC passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} HBASE-14070.HLC passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} HBASE-14070.HLC passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 39m 40s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 50s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:757bf37 | | JIRA Issue | HBASE-18432 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880936/HBASE-18432.HBASE-14070.HLC.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 9da3de9a3ace 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | HBASE-14070.HLC / d9a9904 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7989/testReport/ | | modules | C: hbase-common U: hbase-common | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7989/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Prevent clock from getting stuck after update() > ----------------------------------------------- > > Key: HBASE-18432 > URL: https://issues.apache.org/jira/browse/HBASE-18432 > Project: HBase > Issue Type: Sub-task > Reporter: Appy > Assignee: Appy > Attachments: HBASE-18432.HBASE-14070.HLC.001.patch, > HBASE-18432.HBASE-14070.HLC.002.patch > > > There were a [bunch of > problems|https://issues.apache.org/jira/browse/HBASE-14070?focusedCommentId=16094013&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16094013] > (also copied below) with clock getting stuck after call to update() until > it's own system time caught up. > ---- > PT = physical time, LT = logical time, ST = system time, X = don't care terms > ---- > Core issue: > - Note that in current implementation, we are passing master clock to RS in > open/close region request and RS clock to master in the responses. And they > both update their own time on receiving these request/response. > - On receiving a clock ahead of its own, they update their own clock to its > PT+LT, and keep increasing LT till their own ST catches that PT. > ---- > Proposed solution: > Keep track of skew in clock. And instead of keeping track of physical time, > always compute it by adding system time and skew. > On update(), recalculate skew and validate if it's greater than max_skew. > On toTimestamp(), calculate PT = ST+skew. > ----- > ----- > Issues with current approach: > ---- > Problem 1: Logical time window too small. > RS clock (10, X) > Master clock (20, X) > Master --request-> RS > RS clock (20, X) > While RS's physical java clock (which is backing up physical component of hlc > clock) will still take 10 sec to catch up, we'll keep incrementing logical > component. That means, in worst case, our logical clock window should be big > enough to support all the events that can happen in max skew time. > The problem is, that doesn't seem to be the case. Our logical window is 1M > events (20bits) and max skew time is 30 sec, that results in 33k max write > qps, which is quite low. We can easily see 150k update qps per beefy server > with 1k values. > Even 22 bits won't be enough. We'll need minimum of 23 bits and 20 sec max > skew time to support ~420k max events per second in worst case clock skew. > ---- > Problem 2: Cascading logical time increment. > When more RS are involved say - 3 RS and 1 master. Let's say max skew is 30 > sec. > HLC Clocks (physical time, logical time): X = don't care > RS1: (50, 100k) > Master: (40, X) > RS2: (30, X) > RS3: (20, X) > [RS3's ST behind RS1's by 30 sec.] > RS1 replies to master, sends it's clock (50,X). > Master's clock (50, X). It'll be another 10 sec before it's own physical > clock reaches 50, so HLC's PT will remain 50 for next 10 sec. > Master --> RS2 > RS2's clock = (50, X). > RS2 keeps incrementing LT on writes (since it's own PT is behind) for few > seconds before it replies back to master with (50, X+ few 100k). > Master's clock = (50, X+ few 100k) [Since master's physical clock hasn't > caught up yet, note that it was 10 seconds behind, PT remains 50.]. > Master --> RS3 > RS3's clock (50, X+few 100k) > But RS3's ST is behind RS1's ST by 30 sec, which means it'll keep > incrementing LT for next 30 sec (unless it gets a newer clock from master). > But the problem is, RS3 has much smaller LT window than actual 1M!! > — > Problem 3: Single bad RS clock crashing the cluster: > If a single RS's clock is bad and a bit faster, it'll catch time and keep > pulling master's PT with it. If 'real time' is say 20, max skew time is 10, > and bad RS is at time 29.9, it'll pull master to 29.9 (via next response), > and then any RS less than 19.9, i.e. just 0.1 sec away from real time will > die due to higher than max skew. > This can bring whole clusters down! > — > Problem 4: Time jumps (not a bug, but more of a nuisance) > Say a RS is behind master by 20 sec. On each communication from master, RS > will update its own PT to master's PT, and it'll remain that till RS's ST > catches up. If there are frequent communication from master, ST might never > catch up and RS's PT will actually look like discrete time jumps rather than > continuous time. > For eg. If master communicated with RS at times 30, 40, 50 (RSs corresponding > times are 10, 20, 30), than all events on RS between time [10, 50] will be > timestamped with either 30, 40 or 50. > — -- This message was sent by Atlassian JIRA (v6.4.14#64029)