Pierre Belzile created ARROW-9969:
-------------------------------------
Summary: [C++] RecordBatchBuilder yields invalid result with
dictionary fields
Key: ARROW-9969
URL: https://issues.apache.org/jira/browse/ARROW-9969
Project: Apache Arrow
Issue Type: Bug
Components: C++
Affects Versions: 1.0.1
Reporter: Pierre Belzile
The record batch builder takes a schema as input and uses that schema when
creating the record batch.
However when one or more fields are dictionaries, the data type is unknown
until the dictionary builder flushes and the initial schema often does not
match. The builder needs to modify the schema for the actual data type
generated.
This problem is easily reproduced by providing a schema with a field
dictionary(int16(), utf8()) and adding a single row. This yields a data type of
dictionary(int8(),utf8()).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)