[ 
https://issues.apache.org/jira/browse/ARROW-15604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488548#comment-17488548
 ] 

Weston Pace commented on ARROW-15604:
-------------------------------------

I ran into bugs like this before.  I don't think the cause is really OT but it 
seems to increase the likelihood of failure.  Basically we have async tasks 
that do something like...

 * Run task
 * Mark future finished with result (at this point the main thread is free to 
exit and start shutdown)
 * Cleanup task

If anything in the Cleanup task accesses global state we could get this error.  
In the past the problem was that a task was accessing the default memory pool 
in its cleanup (I don't recall why).  A short term fix is to update the test so 
it isn't using the eternal thread pool or to call WaitForIdle on the CPU thread 
pool but these feel more like hacks than real fixes as a real customer would 
still have a segfault at shutdown.

In this case it seems the cleanup step is doing something with OT (which makes 
perfect sense).

I don't suppose there is any way to block the shutdown until the eternal thread 
pool is idle?  It could probably be signal safe if we waited with a busy loop 
but then I think you run the risk of shutdown delays.

> [C++][CI] Sporadic ThreadSanitizer failure with OpenTracing
> -----------------------------------------------------------
>
>                 Key: ARROW-15604
>                 URL: https://issues.apache.org/jira/browse/ARROW-15604
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Continuous Integration
>            Reporter: Antoine Pitrou
>            Priority: Major
>
> The error is a heap-use-after-free and involves an OpenTracing structure that 
> was deleted by an atexit hook.
> https://github.com/ursacomputing/crossbow/runs/5097362072?check_suite_focus=true#step:5:4843
> Summary:
> {code}
>   Atomic write of size 4 at 0x7b08000136a8 by thread T2:
>   [...]
>     #10 
> opentelemetry::v1::context::RuntimeContext::GetRuntimeContextStorage() 
> /build/cpp/opentelemetry_ep-install/include/opentelemetry/context/runtime_context.h:156:12
>  (libarrow.so.800+0x1e62ef7)
>     #11 
> opentelemetry::v1::context::RuntimeContext::Detach(opentelemetry::v1::context::Token&)
>  
> /build/cpp/opentelemetry_ep-install/include/opentelemetry/context/runtime_context.h:97:54
>  (libarrow.so.800+0x1e70178)
>     #12 opentelemetry::v1::context::Token::~Token() 
> /build/cpp/opentelemetry_ep-install/include/opentelemetry/context/runtime_context.h:168:3
>  (libarrow.so.800+0x1e7012f)
>   [...]
> {code}
> {code}
>   Previous write of size 8 at 0x7b08000136a8 by main thread:
>     #0 operator delete(void*) <null> (arrow-dataset-scanner-test+0x16a69e)
>   [...]
>     #7 
> opentelemetry::v1::nostd::shared_ptr<opentelemetry::v1::context::RuntimeContextStorage>::~shared_ptr()
>  
> /build/cpp/opentelemetry_ep-install/include/opentelemetry/nostd/shared_ptr.h:98:30
>  (libarrow.so.800+0x1e62fb3)
>     #8 cxa_at_exit_wrapper(void*) <null> (arrow-dataset-scanner-test+0x11866f)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to