Kyle Kavanagh created ARROW-12676:
-------------------------------------
Summary: RecordBatchBuilder with uint dictionary creates signed
int Batch
Key: ARROW-12676
URL: https://issues.apache.org/jira/browse/ARROW-12676
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Affects Versions: 3.0.0
Reporter: Kyle Kavanagh
When a RecordBatchBuilder with a dictionary type w/ a uint32 index is flushed
to a batch, the resulting batch contains a int32 index:
{code:java}
BatchBuilder schema after flush:
Symbol: dictionary<values=string, indices=int16, ordered=0>
Status: dictionary<values=string, indices=uint32, ordered=0>{code}
{code:java}
Batch schema after flush:
Symbol: dictionary<values=string, indices=int16, ordered=0>
Status: dictionary<values=string, indices=int32, ordered=0>
{code}
from:
{code:java}
std::shared_ptr<arrow::RecordBatch> batch;
auto status = batchBuilder_>Flush(&batch);
std::cout<<"BatchBuilder schema after flush:
"<<batchBuilder_->schema()->ToString()<<std::endl;
std::cout<<"Batch schema after flush:
"<<batch->schema()->ToString()<<std::endl;
if(!status.ok()) { throw Exception("Arrow batch flush failed: {}", status);
}{code}
This results in a failure to write: "Invalid: Tried to write record batch with
different schema"
I believe this is related to https://issues.apache.org/jira/browse/ARROW-9969
and in particular, this bit:
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/table_builder.cc#L72]
Is the dictionary->Equals comparison checking the signed-ness of the indices?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)