Weston Pace created ARROW-16161:
-----------------------------------

             Summary: [C++] Overhead of std::shared_ptr<DataType> copies is 
causing thread contention
                 Key: ARROW-16161
                 URL: https://issues.apache.org/jira/browse/ARROW-16161
             Project: Apache Arrow
          Issue Type: Sub-task
          Components: C++
            Reporter: Weston Pace


We created a benchmark to measure ExecuteScalarExpression performance in 
ARROW-16014.  We noticed significant thread contention (even though there 
shouldn't be much, if any, for this task) As part of ARROW-16138 we have been 
investigating possible causes.

One cause seems to be contention from copying shared_ptr<DataType> objects.

Two possible solutions jump to mind and I'm sure there are many more.

ExecBatch is an internal type and used inside of ExecuteScalarExpression as 
well as inside of the execution engine.  In the former we can safely assume the 
data types will exist for the duration of the call.  In the latter we can 
safely assume the data types will exist for the duration of the execution plan. 
 Thus we can probably take a more targetted fix and migrate only ExecBatch to 
using DataType* (or const DataType&).

On the other hand, we might consider a more global approach.  All of our 
"stock" data types are assumed to have static storage duration.  However, we 
must use std::shared_ptr<DataType> because users could create their own 
extension types.  We could invent an "extension type registration" system where 
extension types must first be registered with the C++ lib before being used.  
Then we could have long-lived DataType instances and we could replace 
std::shared_ptr<DataType> with DataType* (or const DataType&) throughout most 
of the entire code base.

But, as I mentioned, I'm sure there are many approaches to take.  CC 
[~lidavidm] and [~apitrou] and [~yibocai] for thoughts but this might be 
interesting for just about any C++ dev.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to