[jira] [Comment Edited] (KUDU-1678) Race during abort of pending operations during raft shutdown

Alexey Serbin (JIRA) Wed, 29 Aug 2018 14:21:41 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592395#comment-16592395
 ]


Alexey Serbin edited comment on KUDU-1678 at 8/29/18 9:20 PM:
--------------------------------------------------------------

This bug causes flakiness of the 
{{HmsConfigurations/AlterTableRandomized.TestRandomSequence}} scenario (both in 
HMS and non-HMS modes) in the {{alter_table-randomized-test}}.  The bug 
manifests itself in all build configurations: RELEASE, DEBUG, ASAN, TSAN.

By my observation, running the test with {{--stress_cpu_threads=16}} flag helps 
the bug to manifest itself more often.  I ran the test scenario via dist-test 
~300 times in every configuration with setting for the mentioned flag.

Attached are examples of the failed scenario's output, DEBUG and RELEASE 
builds. [^alter_table-randomized-test-release.log.xz]  
[^alter_table-randomized-test-debug.log.xz] 


was (Author: aserbin):
This bug causes flakiness of the 
{{HmsConfigurations/AlterTableRandomized.TestRandomSequence}} scenario (both in 
HMS and non-HMS modes) in the {{alter_table-randomized-test}}.  The bug 
manifests itself in all build configurations: RELEASE, DEBUG, ASAN, TSAN.

Attached are examples of the failed scenario's output, DEBUG and RELEASE 
builds. [^alter_table-randomized-test-release.log.xz]  
[^alter_table-randomized-test-debug.log.xz] 

> Race during abort of pending operations during raft shutdown
> ------------------------------------------------------------
>
>                 Key: KUDU-1678
>                 URL: https://issues.apache.org/jira/browse/KUDU-1678
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.0.0, 1.6.0, 1.8.0
>            Reporter: Todd Lipcon
>            Priority: Major
>         Attachments: alter_table-randomized-test-debug.log.xz, 
> alter_table-randomized-test-release.log.xz
>
>
> I'm seeing the following race occasionally in alter_table-randomized-test:
> - a follower tablet is shutting down while some operations are pending. The 
> first operation is an ALTER_TABLE, and the second is a WRITE which depends on 
> the ALTER (i.e includes the new column)
> - we cancel the ALTER successfully, and then the thread gets de-scheduled
> - the PrepareTask for the WRITE runs before we're able to cancel it. It then 
> fails to prepare because the alter it depends on has not completed
> It seems like we should probably cancel the pending operations in reverse 
> order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (KUDU-1678) Race during abort of pending operations during raft shutdown

Reply via email to