Hello David Ribeiro Alves, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2333

to look at the new patch set (#4).

Change subject: KUDU-969. Fix handling of crashes just before tablet metadata 
flush
......................................................................

KUDU-969. Fix handling of crashes just before tablet metadata flush

This commit fixes a long-standing bug with the following race:

  - a compaction is in "duplicating" phase (i.e. updates are written to both
    the input and output rowsets
  - a update arrives which is duplicated to both input and output rowsets
  - we crash before writing the new metadata

In bootstrap, we see that the update was written to one rowset (i.e. the output
rowset) whose ID is not present in our metadata. We assumed incorrectly that
this meant the rowset had been written, and then compacted away, and therefore
considered it a "flushed" store. However, in fact, it really should be
considered an "unflushed" store.

Really, we cannot distinguish whether this update was flushed and later 
compacted
away, or if it was never flushed at all. So, this patch removes usage of the 
term
'unflushed' from the bootstrap code (see below for more details).

Despite having somewhat flawed reasoning, it turns out that most of the
bootstrap logic ended up being correct, anyway. In the case above, despite
incorrectly considering the edit "flushed", the other duplicated target (the
input rowset) was considered "unflushed", so we correctly replayed the edit.

However, we did not handle this case correctly in our sanity checks regarding
pending commits at the end of bootstrap. In particular, we checked that a
pending commit did not refer to any store which was considered "flushed". In
the above scenario, the duplicated "output" was designated as "flushed" and
caused the bootstrap process to flag a corruption.

The bulk of this patch is made up of a new fault point and integration test
which exercises crashes just prior to flushing tablet metadata. This ensures
that these scenarios in bootstrap are well covered, and that the resulting
tablet does not diverge from another replica which did not have a fault
injected.

The changes to the bootstrap code itself are actually not so large:

- First, we stop using the term 'unflushed' to refer to stores. Instead, we
  call a store 'active' if it was a current in-memory store at the time of
  crash according to the metadata. A store is considered inactive if either
    (a) we know it was flushed to disk (eg a DMS ID lower than the durable DMS
        ID for a given DRS), or:
    (b) it is a DRS ID which is not currently represented in the tablet 
metadata.
        This could either because the DRS was flushed and compacted away, or
        because it was a DRS that was in the process of being written when the
        server crashed.

- Because of the above terminology improvement, the patch renames
  'WasStoreAlreadyFlushed' to 'IsMemStoreActive' (and correspondingly inverts 
its
  return value). The logic itself is unchanged, though the comments are 
expanded.

- FilterMutate() and FilterInsert() are changed correspondingly to use the new
  method (WasStoreAlreadyFlushed() changed to !IsMemStoreActive())

- Now that we better understand these code paths, I replaced a section of TODO
  in FilterMutate() which previously had a DFATAL with a better comment and a
  Corruption status return.

- The checks for pending commits at the end of bootstrap were changed per the
  description above: we only need to ensure that _one_ target of each pending
  commit is active, not that _all_ targets are active.

Without the changes to tablet_bootstrap.cc, the test fails frequently with
errors like:

FAILED: Corruption: Failed log replay. Reason: CommitMsg was pending but it
    referred to flushed stores. ...

With the change, it passed 500/500 iterations:
http://dist-test.cloudera.org/job?job_id=todd.1456816652.10895

Change-Id: I1d996a470b9a7957ad3eb7fe02f22f85c32b5f9d
---
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
6 files changed, 299 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/2333/4
-- 
To view, visit http://gerrit.cloudera.org:8080/2333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1d996a470b9a7957ad3eb7fe02f22f85c32b5f9d
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to