[
https://issues.apache.org/jira/browse/ARROW-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weston Pace reassigned ARROW-16161:
-----------------------------------
Assignee: Tobias Zagorni
> [C++] Overhead of std::shared_ptr<DataType> copies is causing thread
> contention
> -------------------------------------------------------------------------------
>
> Key: ARROW-16161
> URL: https://issues.apache.org/jira/browse/ARROW-16161
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++
> Reporter: Weston Pace
> Assignee: Tobias Zagorni
> Priority: Major
>
> We created a benchmark to measure ExecuteScalarExpression performance in
> ARROW-16014. We noticed significant thread contention (even though there
> shouldn't be much, if any, for this task) As part of ARROW-16138 we have been
> investigating possible causes.
> One cause seems to be contention from copying shared_ptr<DataType> objects.
> Two possible solutions jump to mind and I'm sure there are many more.
> ExecBatch is an internal type and used inside of ExecuteScalarExpression as
> well as inside of the execution engine. In the former we can safely assume
> the data types will exist for the duration of the call. In the latter we can
> safely assume the data types will exist for the duration of the execution
> plan. Thus we can probably take a more targetted fix and migrate only
> ExecBatch to using DataType* (or const DataType&).
> On the other hand, we might consider a more global approach. All of our
> "stock" data types are assumed to have static storage duration. However, we
> must use std::shared_ptr<DataType> because users could create their own
> extension types. We could invent an "extension type registration" system
> where extension types must first be registered with the C++ lib before being
> used. Then we could have long-lived DataType instances and we could replace
> std::shared_ptr<DataType> with DataType* (or const DataType&) throughout most
> of the entire code base.
> But, as I mentioned, I'm sure there are many approaches to take. CC
> [~lidavidm] and [~apitrou] and [~yibocai] for thoughts but this might be
> interesting for just about any C++ dev.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)