[ https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724232#comment-16724232 ]
Benjamin Kietzman commented on ARROW-4067: ------------------------------------------ Related: It would be useful to add {{VisitBuilderInline}} to visit_inline.h > [C++] RFC: standardize ArrayBuilder subclasses > ---------------------------------------------- > > Key: ARROW-4067 > URL: https://issues.apache.org/jira/browse/ARROW-4067 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Benjamin Kietzman > Priority: Minor > Labels: usability > > Each builder supports different and frequently differently named methods for > appending. It should be possible to establish a more consistent convention, > which would alleviate dev confusion and simplify generics. > For example, let all Builders be required to define at minimum: > * {{Reserve(int64_t)}} > * a nested type named {{Scalar}}, which is the canonical scalar appended to > this builder. Append other types may be supported for convenience. > * {{UnsafeAppend(Scalar)}} > * {{UnsafeAppendNull()}} > The other methods described below can be overridden if an optimization is > available or left defaulted (a CRTP helper can contain the default > implementations, for example {{Append(Scalar)}} would simply be a call to > Reserve then UnsafeAppend. > In addition to their unsafe equivalents, {{Append(Scalar)}} and > {{AppendNull()}} should be available for appending without manual capacity > maintenance. > It is not necessary for the rest of this RFC, but it would simplify builders > further if scalar append methods always had a single argument. For example, > this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} > in favor of {{BinaryBuilder::Append(basic_string_view<uint8_t>)}}. There's no > runtime overhead involved in this change, and developers who have a pointer > and a length instead of a view can just construct one without boilerplate > using brace initialization: {code}b->Append({pointer, length});{code} > Unsafe and safe methods should be provided for appending multiple values as > well. The default implementation will be a trivial loop but if optimizations > are available then this could be overridden (for example instead of copying > bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append > methods for multiple values should accept two arguments, the first of which > contains values and the second of which defines validity. The canonical > multiple append method has signature {{Status(array_view<Scalar> values, > const uint8_t* valid_bytes)}}, but other overloads and helpers could be > provided as well: > {code} > b->Append({{1, 3, 4}}, all_valid); // append values with no nulls > b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector<bool> > for validity > b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid > bits, rather than valid bytes > {code} > Builders of nested types currently require developers to write boilerplate > wrangling the child builders. This could be alleviated by letting nested > builders' append methods return a helper as an output argument: > {code} > ListBuilder::List<Int32Type> lst; > RETURN_NOT_OK(list_builder.Append(&lst)); // ListBuilder::Scalar == > ListBuilder::ListBase* > RETURN_NOT_OK(lst->Append(3)); > RETURN_NOT_OK(lst->Append(4)); > StructBuilder::Struct strct; > RETURN_NOT_OK(struct_builder.Append(&strct)); > RETURN_NOT_OK(strct.Set<StringType>(0, "uuid")); > RETURN_NOT_OK(strct.Set<Int32Type>(2, 47)); > RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)