[ 
https://issues.apache.org/jira/browse/KUDU-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-969:
------------------------------------
    Code Review: http://gerrit.cloudera.org:8080/#/c/2333/

> Bootstrap may occasionally mis-identify previously flushed updates
> ------------------------------------------------------------------
>
>                 Key: KUDU-969
>                 URL: https://issues.apache.org/jira/browse/KUDU-969
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 0.5.0, 0.6.0, 0.7.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.8.0
>
>
> tablet_bootstrap has the following TODO:
> {code}
>    if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(), 
> &last_durable_dms_id)) {
>       // if we have no data about this RowSet, then it must have been flushed 
> and
>       // then deleted.
>       // TODO: how do we avoid a race where we get an update on a rowset 
> before
>       // it is persisted? add docs about the ordering of flush.
>       return true;
>     }
> {code}
> alter_table-randomized-test, when looped in TSAN, seems to fail after around 
> 30 iterations with a sequence like:
> - a compaction enters "duplicating" phase
> - an update arrives, which is duplicated into the old and new rowsets ids
> -- the new rowset ID isn't part of the metadata yet
> - we get kill -9ed before we flush the metadata from the compaction
> It seems that we then mis-identify the update to the "new" store as already 
> flushed, which can cause the bootstrap to fail (or maybe cause a missing 
> update).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to