[ https://issues.apache.org/jira/browse/HBASE-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693862#comment-14693862 ]
Andrew Purtell commented on HBASE-14054: ---------------------------------------- {code} // We have acquired the row lock already. If the system clock is NOT monotonically // non-decreasing (see HBASE-14070) we should make sure that the mutation has a // larger timestamp than what was observed via Get. doBatchMutate already does this, but // there is no way to pass the cellTs. See HBASE-14054. {code} Agree we can use a workaround for checkAndX ahead of HBASE-14070. Nice tests. +1 > Acknowledged writes may get lost if regionserver clock is set backwards > ----------------------------------------------------------------------- > > Key: HBASE-14054 > URL: https://issues.apache.org/jira/browse/HBASE-14054 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.98.6 > Environment: Linux > Reporter: Tobi Vollebregt > Assignee: Enis Soztutar > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 > > Attachments: hbase-14054_v1.patch > > > We experience a small amount of lost acknowledged writes in production on > July 1st (~700 identified so far). > What happened was that we had NTP turned off since June 29th to prevent > issues due to the leap second on June 30th. NTP was turned back on July 1st. > The next day, we noticed we were missing writes to a few of our higher > throughput aggregation tables. > We found that this is caused by HBase taking the current time using > System.currentTimeMillis, which may be set backwards by NTP, and using this > without any checks to populate the timestamp of rows for which the client > didn't supply a timestamp. > Our application uses a read-modify-write pattern using get+checkAndPut to > perform aggregation as follows: > 1. read version 1 > 2. mutate > 3. write version 2 > 4. read version 2 > 5. mutate > 6. write version 3 > The application retries the full read-modify-write if the checkAndPut fails. > What must have happened on July 1st, after we started NTP back up, was this > (timestamps added): > 1. read version 1 (timestamp 10) > 2. mutate > 3. write version 2 (HBase-assigned timestamp 11) > 4. read version 2 (timestamp 11) > 5. mutate > 6. write version 3 (HBase-assigned timestamp 10) > Hence, the last write was eclipsed by the first write, and hence, an > acknowledged write was lost. > While this seems to match documented behavior (paraphrasing: "if timestamp is > not specified HBase will assign a timestamp using System.currentTimeMillis" > "the row with the highest timestamp will be returned by get"), I think it is > very unintuitive and needs at least a big warning in the documentation, along > the lines of "Acknowledged writes may not be visible unless the timestamp is > explicitly specified and equal to or larger than the highest timestamp for > that row". > I would also like to use this ticket to start a discussion on if we can make > the behavior better: > Could HBase assign a timestamp of {{max(max timestamp for the row, > System.currentTimeMillis())}} in the checkAndPut write path, instead of > blindly taking {{System.currentTimeMillis()}}, similar to what has been done > in HBASE-12449 for increment and append? > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)