[ 
https://issues.apache.org/jira/browse/ARROW-14734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456274#comment-17456274
 ] 

Weston Pace commented on ARROW-14734:
-------------------------------------

The actual culprit appears to be SignalCancelTest.  At least, I am unable to 
reproduce this with running CountingSemaphore.Basic alone.  However, if I run 
SignalCancelTest and CountingSemaphore.Basic at the same time then it will fail 
(sometimes without output that matches the description of this issue so that it 
looks like CountingSemaphore.Basic failed)

I can also reproduce it by running SignalCancelTest on repeat.  So far only 
with RegisterUnregister.  I was able to capture a stack trace in Visual Studio 
and it looks like the detached thread in cancel_test.cc:182 is raising a signal 
that is not caught by any custom handler and thus exits the application.

I can only get it to repeat if I stress the CPU.

My guess is that somehow the test is tearing down (and removing the 
cancellation guard) and then the signalling thread actually raises the signal.

> [C++][CI] CountingSemaphore sporadic test crash
> -----------------------------------------------
>
>                 Key: ARROW-14734
>                 URL: https://issues.apache.org/jira/browse/ARROW-14734
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Continuous Integration
>            Reporter: Antoine Pitrou
>            Assignee: Weston Pace
>            Priority: Major
>
> This may be a fluke, but this crash appeared on CI:
> https://github.com/apache/arrow/runs/4234285140?check_suite_focus=true#step:8:110
> {code}
> [==========] Running 134 tests from 23 test suites.
> [----------] Global test environment set-up.
> [----------] 7 tests from CancelTest
> [ RUN      ] CancelTest.StopBasics
> [       OK ] CancelTest.StopBasics (0 ms)
> [ RUN      ] CancelTest.StopTokenCopy
> [       OK ] CancelTest.StopTokenCopy (0 ms)
> [ RUN      ] CancelTest.RequestStopTwice
> [       OK ] CancelTest.RequestStopTwice (0 ms)
> [ RUN      ] CancelTest.Unstoppable
> [       OK ] CancelTest.Unstoppable (0 ms)
> [ RUN      ] CancelTest.SourceVanishes
> [       OK ] CancelTest.SourceVanishes (0 ms)
> [ RUN      ] CancelTest.ThreadedPollSuccess
> [       OK ] CancelTest.ThreadedPollSuccess (11 ms)
> [ RUN      ] CancelTest.ThreadedPollCancel
> [       OK ] CancelTest.ThreadedPollCancel (11 ms)
> [----------] 7 tests from CancelTest (23 ms total)
> [----------] 2 tests from SignalCancelTest
> [ RUN      ] SignalCancelTest.Register
> [       OK ] SignalCancelTest.Register (1 ms)
> [ RUN      ] SignalCancelTest.RegisterUnregister
> [       OK ] SignalCancelTest.RegisterUnregister (111 ms)
> [----------] 2 tests from SignalCancelTest (113 ms total)
> [----------] 3 tests from CountingSemaphore
> [ RUN      ] CountingSemaphore.Basic
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to