Is the current version supposed to allow struct columns with null values to be 
written to parquet:

I narrowed it down to a two rows table with one column and two rows and the 
resulting parquet file is broken both according to parquet-tools as well as our 
own reader (it looks like a buffer is not written in full, but I haven’t dug 
much deeper)

This is the table:

struct: struct<int: int64>
  child 0, int: int64
----
struct:
  [
    -- is_valid:
          [
        false,
        true
      ]
    -- child 0 type: int64
      [
        null,
        2
      ]
  ]

and this is my repro table generation:

std::shared_ptr<arrow::Table> generate_table2() {
    auto i64builder = std::make_shared<arrow::Int64Builder>();
    const std::shared_ptr<arrow::DataType> structType = 
arrow::struct_({arrow::field("int", arrow::int64())});
    arrow::StructBuilder structBuilder(structType, 
arrow::default_memory_pool(), {
            std::static_pointer_cast<arrow::ArrayBuilder>(i64builder)});
    PARQUET_THROW_NOT_OK(structBuilder.AppendNull());
    PARQUET_THROW_NOT_OK(structBuilder.Append());
    PARQUET_THROW_NOT_OK(i64builder->Append(2));
    std::shared_ptr<arrow::Array> structArray;
    PARQUET_THROW_NOT_OK(structBuilder.Finish(&structArray));
    std::shared_ptr<arrow::Schema> schema = 
arrow::schema({arrow::field("struct",structType)});
    return arrow::Table::Make(schema, {structArray});
}
Is this a bug, know limitation or am I doing something dumb?

Thank you
Radu

Reply via email to