[
https://issues.apache.org/jira/browse/KUDU-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated KUDU-969:
------------------------------------
Code Review: http://gerrit.cloudera.org:8080/#/c/2333/
> Bootstrap may occasionally mis-identify previously flushed updates
> ------------------------------------------------------------------
>
> Key: KUDU-969
> URL: https://issues.apache.org/jira/browse/KUDU-969
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: 0.5.0, 0.6.0, 0.7.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
> Fix For: 0.8.0
>
>
> tablet_bootstrap has the following TODO:
> {code}
> if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(),
> &last_durable_dms_id)) {
> // if we have no data about this RowSet, then it must have been flushed
> and
> // then deleted.
> // TODO: how do we avoid a race where we get an update on a rowset
> before
> // it is persisted? add docs about the ordering of flush.
> return true;
> }
> {code}
> alter_table-randomized-test, when looped in TSAN, seems to fail after around
> 30 iterations with a sequence like:
> - a compaction enters "duplicating" phase
> - an update arrives, which is duplicated into the old and new rowsets ids
> -- the new rowset ID isn't part of the metadata yet
> - we get kill -9ed before we flush the metadata from the compaction
> It seems that we then mis-identify the update to the "new" store as already
> flushed, which can cause the bootstrap to fail (or maybe cause a missing
> update).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)