[
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney updated ARROW-4067:
--------------------------------
Fix Version/s: (was: 0.17.0)
1.0.0
> [C++] RFC: standardize ArrayBuilder subclasses
> ----------------------------------------------
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Ben Kietzman
> Priority: Minor
> Labels: usability
> Fix For: 1.0.0
>
>
> Each builder supports different and frequently differently named methods for
> appending. It should be possible to establish a more consistent convention,
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
> * {{Reserve(int64_t)}}
> * a nested type named {{Scalar}}, which is the canonical scalar appended to
> this builder. Append other types may be supported for convenience.
> * {{UnsafeAppend(Scalar)}}
> * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is
> available or left defaulted (a CRTP helper can contain the default
> implementations, for example {{Append(Scalar)}} would simply be a call to
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and
> {{AppendNull()}} should be available for appending without manual capacity
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders
> further if scalar append methods always had a single argument. For example,
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}}
> in favor of {{BinaryBuilder::Append(basic_string_view<uint8_t>)}}. There's no
> runtime overhead involved in this change, and developers who have a pointer
> and a length instead of a view can just construct one without boilerplate
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as
> well. The default implementation will be a trivial loop but if optimizations
> are available then this could be overridden (for example instead of copying
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append
> methods for multiple values should accept two arguments, the first of which
> contains values and the second of which defines validity. The canonical
> multiple append method has signature {{Status(array_view<Scalar> values,
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector<bool>
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate
> wrangling the child builders. This could be alleviated by letting nested
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List<Int32Type> lst;
> RETURN_NOT_OK(list_builder.Append(&lst)); // ListBuilder::Scalar ==
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append(&strct));
> RETURN_NOT_OK(strct.Set<StringType>(0, "uuid"));
> RETURN_NOT_OK(strct.Set<Int32Type>(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)