[
https://issues.apache.org/jira/browse/KUDU-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated KUDU-969:
-----------------------------
Affects Version/s: (was: Private Beta)
0.6.0
0.7.0
0.5.0
Target Version/s: 0.8.0 (was: GA)
> Bootstrap may occasionally mis-identify previously flushed updates
> ------------------------------------------------------------------
>
> Key: KUDU-969
> URL: https://issues.apache.org/jira/browse/KUDU-969
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: 0.6.0, 0.7.0, 0.5.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Blocker
>
> tablet_bootstrap has the following TODO:
> {code}
> if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(),
> &last_durable_dms_id)) {
> // if we have no data about this RowSet, then it must have been flushed
> and
> // then deleted.
> // TODO: how do we avoid a race where we get an update on a rowset
> before
> // it is persisted? add docs about the ordering of flush.
> return true;
> }
> {code}
> alter_table-randomized-test, when looped in TSAN, seems to fail after around
> 30 iterations with a sequence like:
> - a compaction enters "duplicating" phase
> - an update arrives, which is duplicated into the old and new rowsets ids
> -- the new rowset ID isn't part of the metadata yet
> - we get kill -9ed before we flush the metadata from the compaction
> It seems that we then mis-identify the update to the "new" store as already
> flushed, which can cause the bootstrap to fail (or maybe cause a missing
> update).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)