Hi Radu,
This appears to be a bug, would you mind filing a bug in JIRA?

I'm looking into it to see if I can figure out what is going on.

Thanks,
Micah

On Wed, Jul 29, 2020 at 1:07 PM Radu Teodorescu
<radukay...@yahoo.com.invalid> wrote:

> Is the current version supposed to allow struct columns with null values
> to be written to parquet:
>
> I narrowed it down to a two rows table with one column and two rows and
> the resulting parquet file is broken both according to parquet-tools as
> well as our own reader (it looks like a buffer is not written in full, but
> I haven’t dug much deeper)
>
> This is the table:
>
> struct: struct<int: int64>
>   child 0, int: int64
> ----
> struct:
>   [
>     -- is_valid:
>           [
>         false,
>         true
>       ]
>     -- child 0 type: int64
>       [
>         null,
>         2
>       ]
>   ]
>
> and this is my repro table generation:
>
> std::shared_ptr<arrow::Table> generate_table2() {
>     auto i64builder = std::make_shared<arrow::Int64Builder>();
>     const std::shared_ptr<arrow::DataType> structType =
> arrow::struct_({arrow::field("int", arrow::int64())});
>     arrow::StructBuilder structBuilder(structType,
> arrow::default_memory_pool(), {
>             std::static_pointer_cast<arrow::ArrayBuilder>(i64builder)});
>     PARQUET_THROW_NOT_OK(structBuilder.AppendNull());
>     PARQUET_THROW_NOT_OK(structBuilder.Append());
>     PARQUET_THROW_NOT_OK(i64builder->Append(2));
>     std::shared_ptr<arrow::Array> structArray;
>     PARQUET_THROW_NOT_OK(structBuilder.Finish(&structArray));
>     std::shared_ptr<arrow::Schema> schema =
> arrow::schema({arrow::field("struct",structType)});
>     return arrow::Table::Make(schema, {structArray});
> }
> Is this a bug, know limitation or am I doing something dumb?
>
> Thank you
> Radu
>
>

Reply via email to