Pierre Belzile created ARROW-9969:
-------------------------------------

             Summary: [C++] RecordBatchBuilder yields invalid result with 
dictionary fields
                 Key: ARROW-9969
                 URL: https://issues.apache.org/jira/browse/ARROW-9969
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
    Affects Versions: 1.0.1
            Reporter: Pierre Belzile


The record batch builder takes a schema as input and uses that schema when 
creating the record batch.

However when one or more fields are dictionaries, the data type is unknown 
until the dictionary builder flushes and the initial schema often does not 
match. The builder needs to modify the schema for the actual data type 
generated.

This problem is easily reproduced by providing a schema with a field 
dictionary(int16(), utf8()) and adding a single row. This yields a data type of 
dictionary(int8(),utf8()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to