[
https://issues.apache.org/jira/browse/KUDU-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491021#comment-16491021
]
Will Berkeley commented on KUDU-1678:
-------------------------------------
Noting that this has been seen in the wild now. It caused all 3 tservers
hosting replicas of a tablet to crash, and then they were not able to start
again until the unaborted write was truncated from the WAL.
> Race during abort of pending operations during raft shutdown
> ------------------------------------------------------------
>
> Key: KUDU-1678
> URL: https://issues.apache.org/jira/browse/KUDU-1678
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 1.0.0, 1.6.0
> Reporter: Todd Lipcon
> Priority: Major
>
> I'm seeing the following race occasionally in alter_table-randomized-test:
> - a follower tablet is shutting down while some operations are pending. The
> first operation is an ALTER_TABLE, and the second is a WRITE which depends on
> the ALTER (i.e includes the new column)
> - we cancel the ALTER successfully, and then the thread gets de-scheduled
> - the PrepareTask for the WRITE runs before we're able to cancel it. It then
> fails to prepare because the alter it depends on has not completed
> It seems like we should probably cancel the pending operations in reverse
> order.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)