[ 
https://issues.apache.org/jira/browse/ARROW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719077#comment-16719077
 ] 

Wes McKinney commented on ARROW-47:
-----------------------------------

I had been thinking about having standalone in-struct storage for primitive 
types and using Array/ArrayData for list scalars at least. For binary I'm not 
sure. I just sketched the following

{code:c++}

class Scalar {
  std::shared_ptr<DataType> type() const { return type_; }

  bool is_valid() const { return is_valid_; }

 protected:
  std::shared_ptr<DataType> type_;
  bool is_valid_;
};

template <typename Type>
class PrimitiveScalar : public Scalar {
 public:
  using T = typename Type::c_type;

  T value() const { return value_; }

 private:
  T value_;
};

class BinaryScalar : public Scalar {
 protected:
  std::shared_ptr<Buffer> value_;
};

class ListScalar : public Scalar {
 protected:
  std::shared_ptr<Array> value_;
};
{code}

I think using {{Buffer}} for binary scalars is probably the lightest weight 
thing that also ensures memory lifetime. In practice (e.g. in analytics code 
paths), we will do dynamic dispatch on the type, so as long as we have a 
reasonable base class that exposes that, we should try to make the object as 
lightweight / simple as possible.

I don't anticipate we'll be dealing with large collections of these scalar 
objects. But we want to be able to have a suitable way to algebraically 
represent the data when executing a computation graph (and writing kernel 
implementations that may accept either array or scalar -- with "broadcasting" 
-- arguments)

> [C++] Consider adding a scalar type object model
> ------------------------------------------------
>
>                 Key: ARROW-47
>                 URL: https://issues.apache.org/jira/browse/ARROW-47
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: Analytics
>             Fix For: 0.13.0
>
>
> Just did this on the Python side. In later analytics routines, passing in 
> scalar values (example: Array + Scalar) requires some kind of container. Some 
> systems, like the R language, solve this problem with length-1 arrays, but we 
> should do some analysis of use cases and figure out what will work best for 
> Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to