[
https://issues.apache.org/jira/browse/KUDU-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated KUDU-1131:
-------------------------------------
Priority: Critical (was: Blocker)
Lowering the priority since it isn't a burning issue anymore.
> Crash in compaction due to overlapping flush/undo snapshots
> -----------------------------------------------------------
>
> Key: KUDU-1131
> URL: https://issues.apache.org/jira/browse/KUDU-1131
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: Private Beta
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Labels: crash
> Attachments: alter_table-randomized-test.txt.gz
>
>
> Binglin is triggering a crash reasonably regularly under load:
> - a tablet is flushed with a snapshot that has at least one txn in flight,
> but a txn with a later timestamp already committed. eg:
> -- txn 1 and 3 committed, 2 in flight. This gives a flush snapshot txn <= 1
> or txn == 3.
> - as of KUDU-987, we don't wait for all in-flight transactions to commit
> during flush (necessary since the txn might be in flight for a while)
> - because txn 3 was committed, the UNDO delta has a ts range of [1, 3]
> - we then select the newly-flushed rowset for compaction, and txn 2 is
> _still_ not committed
> -- at this point, we hit a CHECK failure because we see an UNDO file which
> can't be fully ignored by a compaction (its time range overlaps with
> uncommitted ranges in the current snapshot)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)