hi Ben, On this possibility
- Make ArrayBuilder::type() virtual. This will be much more expensive for nested builders and for applications which need to branch on ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as well. Could you explain what would be a situation where these methods (type() and type_id()) would be on a hot path? Whether virtual or not, I would think that writing code like VisitTypeInline(*builder->type(), func) would not necessarily be advisable for cell-level granularity in general. - Wes On Sun, Aug 18, 2019 at 8:05 PM Sutou Kouhei <k...@clear-code.com> wrote: > > Hi, > > Thanks for working of this. > > This fix is very happy for Arrow GLib. > > Arrow GLib wants array builder's type ID to convert C++ > object to GLib object. Here is a JIRA issue for this use case: > https://issues.apache.org/jira/browse/ARROW-5355 > > In Arrow GLib's use case, ArrayBuilder::type_id() is only > required. Arrow GLib doesn't need ArrayBuilder::type(). > > Furthermore, Arrow GLib just wants to know the type of > ArrayBuilder (not the type of built Array). Arrow GLib > doesn't need ArrayBuilder::type() nor > ArrayBuilder::type_id() if ArrayBuilder::id() or something > (e.g. visitor API for ArrayBuilder) is provided. Arrow GLib > can use dynamic_cast and NULL check to detect the real > ArrowBuilder's class. But I don't want to use it as much as > possible. > > > Thanks, > -- > kou > > In <CALPsrS2FdQWnZtEs=tnwl23+q+oyery92gcn6dy3gxdn3y8...@mail.gmail.com> > "[DISCUSS] ArrayBuilders with mutable type" on Fri, 16 Aug 2019 16:40:15 > -0400, > Ben Kietzman <ben.kietz...@rstudio.com> wrote: > > > For some array builders, ArrayBuilder::type() will be different from the > > type of array produced by ArrayBuilder::Finish(). These are: > > - AdaptiveIntBuilder will progress through {int8, int16, int32, int64} > > whenever a value is inserted which cannot be stored using the current > > integer type. > > - DictionaryBuilder will similarly increase the width of its indices if its > > memo table grows too large. > > - {Dense,Sparse}UnionBuilder may append a new child builder > > - Any nested builder whose child builders include a builder with mutable > > type > > > > IMHO if ArrayBuilder::type is sporadically inaccurate then it's a user > > hostile API and needs to be fixed. > > > > The current solution solution is for mutable type to be marked by > > ArrayBuilder::type() == null. This results in significant loss of metadata > > from nested types; for example StructBuilder::FinishInternal currently sets > > all field names to "" if constructed with null type. Null type is > > inconsistently applied; a builder of list(dictionary()) will currently > > finish to an invalid array if the dictionary builder widens its indices > > before finishing. > > > > Options: > > - Implement array builders such that ArrayBuilder::type() is always the > > type to which the builder would Finish. There is a PR for this > > https://github.com/apache/arrow/pull/4930 but it introduces performance > > regressions for the dictionary builders: 5% if the values are integer, 1.8% > > if they are strings. > > - Provide ArrayBuilder::UpdateType(); type() is not guaranteed to be > > accurate unless immediately preceded by UpdateType(). > > - Remove ArrayBuilder::type() in favor of ArrayBuilder::type_id(), which > > will be an immutable property of ArrayBuilders. > > - Make ArrayBuilder::type() virtual. This will be much more expensive for > > nested builders and for applications which need to branch on > > ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as > > well.