Is the current version supposed to allow struct columns with null values to be
written to parquet:
I narrowed it down to a two rows table with one column and two rows and the
resulting parquet file is broken both according to parquet-tools as well as our
own reader (it looks like a buffer is not written in full, but I haven’t dug
much deeper)
This is the table:
struct: struct<int: int64>
child 0, int: int64
----
struct:
[
-- is_valid:
[
false,
true
]
-- child 0 type: int64
[
null,
2
]
]
and this is my repro table generation:
std::shared_ptr<arrow::Table> generate_table2() {
auto i64builder = std::make_shared<arrow::Int64Builder>();
const std::shared_ptr<arrow::DataType> structType =
arrow::struct_({arrow::field("int", arrow::int64())});
arrow::StructBuilder structBuilder(structType,
arrow::default_memory_pool(), {
std::static_pointer_cast<arrow::ArrayBuilder>(i64builder)});
PARQUET_THROW_NOT_OK(structBuilder.AppendNull());
PARQUET_THROW_NOT_OK(structBuilder.Append());
PARQUET_THROW_NOT_OK(i64builder->Append(2));
std::shared_ptr<arrow::Array> structArray;
PARQUET_THROW_NOT_OK(structBuilder.Finish(&structArray));
std::shared_ptr<arrow::Schema> schema =
arrow::schema({arrow::field("struct",structType)});
return arrow::Table::Make(schema, {structArray});
}
Is this a bug, know limitation or am I doing something dumb?
Thank you
Radu