Hi, Thanks for working of this.
This fix is very happy for Arrow GLib. Arrow GLib wants array builder's type ID to convert C++ object to GLib object. Here is a JIRA issue for this use case: https://issues.apache.org/jira/browse/ARROW-5355 In Arrow GLib's use case, ArrayBuilder::type_id() is only required. Arrow GLib doesn't need ArrayBuilder::type(). Furthermore, Arrow GLib just wants to know the type of ArrayBuilder (not the type of built Array). Arrow GLib doesn't need ArrayBuilder::type() nor ArrayBuilder::type_id() if ArrayBuilder::id() or something (e.g. visitor API for ArrayBuilder) is provided. Arrow GLib can use dynamic_cast and NULL check to detect the real ArrowBuilder's class. But I don't want to use it as much as possible. Thanks, -- kou In <CALPsrS2FdQWnZtEs=tnwl23+q+oyery92gcn6dy3gxdn3y8...@mail.gmail.com> "[DISCUSS] ArrayBuilders with mutable type" on Fri, 16 Aug 2019 16:40:15 -0400, Ben Kietzman <ben.kietz...@rstudio.com> wrote: > For some array builders, ArrayBuilder::type() will be different from the > type of array produced by ArrayBuilder::Finish(). These are: > - AdaptiveIntBuilder will progress through {int8, int16, int32, int64} > whenever a value is inserted which cannot be stored using the current > integer type. > - DictionaryBuilder will similarly increase the width of its indices if its > memo table grows too large. > - {Dense,Sparse}UnionBuilder may append a new child builder > - Any nested builder whose child builders include a builder with mutable > type > > IMHO if ArrayBuilder::type is sporadically inaccurate then it's a user > hostile API and needs to be fixed. > > The current solution solution is for mutable type to be marked by > ArrayBuilder::type() == null. This results in significant loss of metadata > from nested types; for example StructBuilder::FinishInternal currently sets > all field names to "" if constructed with null type. Null type is > inconsistently applied; a builder of list(dictionary()) will currently > finish to an invalid array if the dictionary builder widens its indices > before finishing. > > Options: > - Implement array builders such that ArrayBuilder::type() is always the > type to which the builder would Finish. There is a PR for this > https://github.com/apache/arrow/pull/4930 but it introduces performance > regressions for the dictionary builders: 5% if the values are integer, 1.8% > if they are strings. > - Provide ArrayBuilder::UpdateType(); type() is not guaranteed to be > accurate unless immediately preceded by UpdateType(). > - Remove ArrayBuilder::type() in favor of ArrayBuilder::type_id(), which > will be an immutable property of ArrayBuilders. > - Make ArrayBuilder::type() virtual. This will be much more expensive for > nested builders and for applications which need to branch on > ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as > well.