[ 
https://issues.apache.org/jira/browse/ARROW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719134#comment-16719134
 ] 

Francois Saint-Jacques commented on ARROW-47:
---------------------------------------------

At first sight, I'd say that StructScalar (and Scalar) memory layout will be 
critical to the implementation of ARROW-3978 (and joins of multiple 
columns/expressions), hashing/probing on the columnar representation SoA is a 
performance killer (due k pointer indirections and cacheline reads where k is 
the number of field).

The second thing, is that when we'll work with intermediary results of 
`Scalar`s, the types will almost always be homogeneous. For example, when 
computing the hash table of a join/group-by, you'll have something like 
`hash<Scalar, Result>` where the type for each scalar instances is the same 
(minus Null, but we can and should specialize for nullability). Thus adding the 
type shared_ptr _and_ an `is_valid` boolean is somewhat costly (16 + 1 + 
sizeof(primitive_type).

This optimization can be hidden in the implementation, but I wonder if we'll 
have to expose the collections at the API boundaries.

> [C++] Consider adding a scalar type object model
> ------------------------------------------------
>
>                 Key: ARROW-47
>                 URL: https://issues.apache.org/jira/browse/ARROW-47
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: Analytics
>             Fix For: 0.13.0
>
>
> Just did this on the Python side. In later analytics routines, passing in 
> scalar values (example: Array + Scalar) requires some kind of container. Some 
> systems, like the R language, solve this problem with length-1 arrays, but we 
> should do some analysis of use cases and figure out what will work best for 
> Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to