So I think the problem is within WriteLevelSpaced [1], specifically how we calculate "min_spaced_def_level", seems incorrect (I think this only worked for single nested lists). This value probably needs to be calculated by walking up the tree to find the def level of the first repeated value.
[1] https://github.com/apache/arrow/blob/3586292d62c8c348e9fb85676eb524cde53179cf/cpp/src/parquet/column_writer.cc#L1141 On Wed, Jul 29, 2020 at 8:01 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Radu, > This appears to be a bug, would you mind filing a bug in JIRA? > > I'm looking into it to see if I can figure out what is going on. > > Thanks, > Micah > > On Wed, Jul 29, 2020 at 1:07 PM Radu Teodorescu > <radukay...@yahoo.com.invalid> wrote: > >> Is the current version supposed to allow struct columns with null values >> to be written to parquet: >> >> I narrowed it down to a two rows table with one column and two rows and >> the resulting parquet file is broken both according to parquet-tools as >> well as our own reader (it looks like a buffer is not written in full, but >> I haven’t dug much deeper) >> >> This is the table: >> >> struct: struct<int: int64> >> child 0, int: int64 >> ---- >> struct: >> [ >> -- is_valid: >> [ >> false, >> true >> ] >> -- child 0 type: int64 >> [ >> null, >> 2 >> ] >> ] >> >> and this is my repro table generation: >> >> std::shared_ptr<arrow::Table> generate_table2() { >> auto i64builder = std::make_shared<arrow::Int64Builder>(); >> const std::shared_ptr<arrow::DataType> structType = >> arrow::struct_({arrow::field("int", arrow::int64())}); >> arrow::StructBuilder structBuilder(structType, >> arrow::default_memory_pool(), { >> std::static_pointer_cast<arrow::ArrayBuilder>(i64builder)}); >> PARQUET_THROW_NOT_OK(structBuilder.AppendNull()); >> PARQUET_THROW_NOT_OK(structBuilder.Append()); >> PARQUET_THROW_NOT_OK(i64builder->Append(2)); >> std::shared_ptr<arrow::Array> structArray; >> PARQUET_THROW_NOT_OK(structBuilder.Finish(&structArray)); >> std::shared_ptr<arrow::Schema> schema = >> arrow::schema({arrow::field("struct",structType)}); >> return arrow::Table::Make(schema, {structArray}); >> } >> Is this a bug, know limitation or am I doing something dumb? >> >> Thank you >> Radu >> >>