For some array builders, ArrayBuilder::type() will be different from the type of array produced by ArrayBuilder::Finish(). These are: - AdaptiveIntBuilder will progress through {int8, int16, int32, int64} whenever a value is inserted which cannot be stored using the current integer type. - DictionaryBuilder will similarly increase the width of its indices if its memo table grows too large. - {Dense,Sparse}UnionBuilder may append a new child builder - Any nested builder whose child builders include a builder with mutable type
IMHO if ArrayBuilder::type is sporadically inaccurate then it's a user hostile API and needs to be fixed. The current solution solution is for mutable type to be marked by ArrayBuilder::type() == null. This results in significant loss of metadata from nested types; for example StructBuilder::FinishInternal currently sets all field names to "" if constructed with null type. Null type is inconsistently applied; a builder of list(dictionary()) will currently finish to an invalid array if the dictionary builder widens its indices before finishing. Options: - Implement array builders such that ArrayBuilder::type() is always the type to which the builder would Finish. There is a PR for this https://github.com/apache/arrow/pull/4930 but it introduces performance regressions for the dictionary builders: 5% if the values are integer, 1.8% if they are strings. - Provide ArrayBuilder::UpdateType(); type() is not guaranteed to be accurate unless immediately preceded by UpdateType(). - Remove ArrayBuilder::type() in favor of ArrayBuilder::type_id(), which will be an immutable property of ArrayBuilders. - Make ArrayBuilder::type() virtual. This will be much more expensive for nested builders and for applications which need to branch on ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as well.