[ 
https://issues.apache.org/jira/browse/ARROW-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600083#comment-17600083
 ] 

Todd Farmer commented on ARROW-16161:
-------------------------------------

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Overhead of std::shared_ptr<DataType> copies is causing thread 
> contention
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-16161
>                 URL: https://issues.apache.org/jira/browse/ARROW-16161
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Tobias Zagorni
>            Priority: Major
>         Attachments: ExecArrayData-difference.txt
>
>
> We created a benchmark to measure ExecuteScalarExpression performance in 
> ARROW-16014.  We noticed significant thread contention (even though there 
> shouldn't be much, if any, for this task) As part of ARROW-16138 we have been 
> investigating possible causes.
> One cause seems to be contention from copying shared_ptr<DataType> objects.
> Two possible solutions jump to mind and I'm sure there are many more.
> ExecBatch is an internal type and used inside of ExecuteScalarExpression as 
> well as inside of the execution engine.  In the former we can safely assume 
> the data types will exist for the duration of the call.  In the latter we can 
> safely assume the data types will exist for the duration of the execution 
> plan.  Thus we can probably take a more targetted fix and migrate only 
> ExecBatch to using DataType* (or const DataType&).
> On the other hand, we might consider a more global approach.  All of our 
> "stock" data types are assumed to have static storage duration.  However, we 
> must use std::shared_ptr<DataType> because users could create their own 
> extension types.  We could invent an "extension type registration" system 
> where extension types must first be registered with the C++ lib before being 
> used.  Then we could have long-lived DataType instances and we could replace 
> std::shared_ptr<DataType> with DataType* (or const DataType&) throughout most 
> of the entire code base.
> But, as I mentioned, I'm sure there are many approaches to take.  CC 
> [~lidavidm] and [~apitrou] and [~yibocai] for thoughts but this might be 
> interesting for just about any C++ dev.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to