hi Ben,

On this possibility

- Make ArrayBuilder::type() virtual. This will be much more expensive for
nested builders and for applications which need to branch on
ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as
well.

Could you explain what would be a situation where these methods
(type() and type_id()) would be on a hot path? Whether virtual or not,
I would think that writing code like

VisitTypeInline(*builder->type(), func)

would not necessarily be advisable for cell-level granularity in general.

- Wes

On Sun, Aug 18, 2019 at 8:05 PM Sutou Kouhei <k...@clear-code.com> wrote:
>
> Hi,
>
> Thanks for working of this.
>
> This fix is very happy for Arrow GLib.
>
> Arrow GLib wants array builder's type ID to convert C++
> object to GLib object. Here is a JIRA issue for this use case:
>   https://issues.apache.org/jira/browse/ARROW-5355
>
> In Arrow GLib's use case, ArrayBuilder::type_id() is only
> required. Arrow GLib doesn't need ArrayBuilder::type().
>
> Furthermore, Arrow GLib just wants to know the type of
> ArrayBuilder (not the type of built Array). Arrow GLib
> doesn't need ArrayBuilder::type() nor
> ArrayBuilder::type_id() if ArrayBuilder::id() or something
> (e.g. visitor API for ArrayBuilder) is provided. Arrow GLib
> can use dynamic_cast and NULL check to detect the real
> ArrowBuilder's class. But I don't want to use it as much as
> possible.
>
>
> Thanks,
> --
> kou
>
> In <CALPsrS2FdQWnZtEs=tnwl23+q+oyery92gcn6dy3gxdn3y8...@mail.gmail.com>
>   "[DISCUSS] ArrayBuilders with mutable type" on Fri, 16 Aug 2019 16:40:15 
> -0400,
>   Ben Kietzman <ben.kietz...@rstudio.com> wrote:
>
> > For some array builders, ArrayBuilder::type() will be different from the
> > type of array produced by ArrayBuilder::Finish(). These are:
> > - AdaptiveIntBuilder will progress through {int8, int16, int32, int64}
> > whenever a value is inserted which cannot be stored using the current
> > integer type.
> > - DictionaryBuilder will similarly increase the width of its indices if its
> > memo table grows too large.
> > - {Dense,Sparse}UnionBuilder may append a new child builder
> > - Any nested builder whose child builders include a builder with mutable
> > type
> >
> > IMHO if ArrayBuilder::type is sporadically inaccurate then it's a user
> > hostile API and needs to be fixed.
> >
> > The current solution solution is for mutable type to be marked by
> > ArrayBuilder::type() == null. This results in significant loss of metadata
> > from nested types; for example StructBuilder::FinishInternal currently sets
> > all field names to "" if constructed with null type. Null type is
> > inconsistently applied; a builder of list(dictionary()) will currently
> > finish to an invalid array if the dictionary builder widens its indices
> > before finishing.
> >
> > Options:
> > - Implement array builders such that ArrayBuilder::type() is always the
> > type to which the builder would Finish. There is a PR for this
> > https://github.com/apache/arrow/pull/4930 but it introduces performance
> > regressions for the dictionary builders: 5% if the values are integer, 1.8%
> > if they are strings.
> > - Provide ArrayBuilder::UpdateType(); type() is not guaranteed to be
> > accurate unless immediately preceded by UpdateType().
> > - Remove ArrayBuilder::type() in favor of ArrayBuilder::type_id(), which
> > will be an immutable property of ArrayBuilders.
> > - Make ArrayBuilder::type() virtual. This will be much more expensive for
> > nested builders and for applications which need to branch on
> > ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as
> > well.

Reply via email to