[jira] [Comment Edited] (ARROW-16138) [C++] Improve performance of ExecuteScalarExpression

Tobias Zagorni (Jira) Fri, 08 Apr 2022 15:58:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519837#comment-17519837
 ]


Tobias Zagorni edited comment on ARROW-16138 at 4/8/22 10:57 PM:
-----------------------------------------------------------------

The thread contention in small batch sizes are largely caused by 
copying/destructing shared pointers to DataType. Different threads constantly 
changing the refcount of the Int64 DataType seems to causes a lot of inter-core 
syncronization

!Flamegraph.png|width=891,height=479!


was (Author: JIRAUSER286565):
The thread contention in small batch sizes are largely caused by 
copying/destructing pointers to DataType. Different threads constantly changing 
the refcount of the Int64 DataType seems to causes a lot of inter-core 
syncronization

!Flamegraph.png|width=891,height=479!

> [C++] Improve performance of ExecuteScalarExpression
> ----------------------------------------------------
>
>                 Key: ARROW-16138
>                 URL: https://issues.apache.org/jira/browse/ARROW-16138
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Tobias Zagorni
>            Priority: Major
>         Attachments: Flamegraph.png
>
>
> One of the things we want to be able to do in the streaming execution engine 
> is process data in small L2 sized batches.  Based on literature we might like 
> to use batches somewhere in the range of 1k to 16k rows.  In ARROW-16014 we 
> created a benchmark to measure the performance of ExecuteScalarExpression as 
> the size of our batches got smaller.  There are two things we observed:
>  * Something is causing thread contention.  We should be able to get pretty 
> close to perfect linear speedup when we are evaluating scalar expressions and 
> the batch size fits entirely into L2.  We are not seeing that.
>  * The overhead of ExecuteScalarExpression is too high when processing small 
> batches.  Even when the expression is doing real work (e.g. copies, 
> comparisons) the execution time starts to be dominated by overhead when we 
> have 10k sized batches.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (ARROW-16138) [C++] Improve performance of ExecuteScalarExpression

Reply via email to