Hi,

Thanks for working of this.

This fix is very happy for Arrow GLib.

Arrow GLib wants array builder's type ID to convert C++
object to GLib object. Here is a JIRA issue for this use case:
  https://issues.apache.org/jira/browse/ARROW-5355

In Arrow GLib's use case, ArrayBuilder::type_id() is only
required. Arrow GLib doesn't need ArrayBuilder::type().

Furthermore, Arrow GLib just wants to know the type of
ArrayBuilder (not the type of built Array). Arrow GLib
doesn't need ArrayBuilder::type() nor
ArrayBuilder::type_id() if ArrayBuilder::id() or something
(e.g. visitor API for ArrayBuilder) is provided. Arrow GLib
can use dynamic_cast and NULL check to detect the real
ArrowBuilder's class. But I don't want to use it as much as
possible.


Thanks,
--
kou

In <CALPsrS2FdQWnZtEs=tnwl23+q+oyery92gcn6dy3gxdn3y8...@mail.gmail.com>
  "[DISCUSS] ArrayBuilders with mutable type" on Fri, 16 Aug 2019 16:40:15 
-0400,
  Ben Kietzman <ben.kietz...@rstudio.com> wrote:

> For some array builders, ArrayBuilder::type() will be different from the
> type of array produced by ArrayBuilder::Finish(). These are:
> - AdaptiveIntBuilder will progress through {int8, int16, int32, int64}
> whenever a value is inserted which cannot be stored using the current
> integer type.
> - DictionaryBuilder will similarly increase the width of its indices if its
> memo table grows too large.
> - {Dense,Sparse}UnionBuilder may append a new child builder
> - Any nested builder whose child builders include a builder with mutable
> type
> 
> IMHO if ArrayBuilder::type is sporadically inaccurate then it's a user
> hostile API and needs to be fixed.
> 
> The current solution solution is for mutable type to be marked by
> ArrayBuilder::type() == null. This results in significant loss of metadata
> from nested types; for example StructBuilder::FinishInternal currently sets
> all field names to "" if constructed with null type. Null type is
> inconsistently applied; a builder of list(dictionary()) will currently
> finish to an invalid array if the dictionary builder widens its indices
> before finishing.
> 
> Options:
> - Implement array builders such that ArrayBuilder::type() is always the
> type to which the builder would Finish. There is a PR for this
> https://github.com/apache/arrow/pull/4930 but it introduces performance
> regressions for the dictionary builders: 5% if the values are integer, 1.8%
> if they are strings.
> - Provide ArrayBuilder::UpdateType(); type() is not guaranteed to be
> accurate unless immediately preceded by UpdateType().
> - Remove ArrayBuilder::type() in favor of ArrayBuilder::type_id(), which
> will be an immutable property of ArrayBuilders.
> - Make ArrayBuilder::type() virtual. This will be much more expensive for
> nested builders and for applications which need to branch on
> ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as
> well.

Reply via email to