[ 
https://issues.apache.org/jira/browse/KUDU-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-1131:
-------------------------------------
    Priority: Critical  (was: Blocker)

Lowering the priority since it isn't a burning issue anymore.

> Crash in compaction due to overlapping flush/undo snapshots
> -----------------------------------------------------------
>
>                 Key: KUDU-1131
>                 URL: https://issues.apache.org/jira/browse/KUDU-1131
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: Private Beta
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>              Labels: crash
>         Attachments: alter_table-randomized-test.txt.gz
>
>
> Binglin is triggering a crash reasonably regularly under load:
> - a tablet is flushed with a snapshot that has at least one txn in flight, 
> but a txn with a later timestamp already committed. eg:
> -- txn 1 and 3 committed, 2 in flight. This gives a flush snapshot txn <= 1 
> or txn == 3.
> - as of KUDU-987, we don't wait for all in-flight transactions to commit 
> during flush (necessary since the txn might be in flight for a while)
> - because txn 3 was committed, the UNDO delta has a ts range of [1, 3]
> - we then select the newly-flushed rowset for compaction, and txn 2 is 
> _still_ not committed
> -- at this point, we hit a CHECK failure because we see an UNDO file which 
> can't be fully ignored by a compaction (its time range overlaps with 
> uncommitted ranges in the current snapshot)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to