[
https://issues.apache.org/jira/browse/ARROW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719077#comment-16719077
]
Wes McKinney commented on ARROW-47:
-----------------------------------
I had been thinking about having standalone in-struct storage for primitive
types and using Array/ArrayData for list scalars at least. For binary I'm not
sure. I just sketched the following
{code:c++}
class Scalar {
std::shared_ptr<DataType> type() const { return type_; }
bool is_valid() const { return is_valid_; }
protected:
std::shared_ptr<DataType> type_;
bool is_valid_;
};
template <typename Type>
class PrimitiveScalar : public Scalar {
public:
using T = typename Type::c_type;
T value() const { return value_; }
private:
T value_;
};
class BinaryScalar : public Scalar {
protected:
std::shared_ptr<Buffer> value_;
};
class ListScalar : public Scalar {
protected:
std::shared_ptr<Array> value_;
};
{code}
I think using {{Buffer}} for binary scalars is probably the lightest weight
thing that also ensures memory lifetime. In practice (e.g. in analytics code
paths), we will do dynamic dispatch on the type, so as long as we have a
reasonable base class that exposes that, we should try to make the object as
lightweight / simple as possible.
I don't anticipate we'll be dealing with large collections of these scalar
objects. But we want to be able to have a suitable way to algebraically
represent the data when executing a computation graph (and writing kernel
implementations that may accept either array or scalar -- with "broadcasting"
-- arguments)
> [C++] Consider adding a scalar type object model
> ------------------------------------------------
>
> Key: ARROW-47
> URL: https://issues.apache.org/jira/browse/ARROW-47
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Assignee: Uwe L. Korn
> Priority: Major
> Labels: Analytics
> Fix For: 0.13.0
>
>
> Just did this on the Python side. In later analytics routines, passing in
> scalar values (example: Array + Scalar) requires some kind of container. Some
> systems, like the R language, solve this problem with length-1 arrays, but we
> should do some analysis of use cases and figure out what will work best for
> Arrow.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)