Didn't realize this question was on the Arrow mailing list instead of the Parquet mailing list!
You can make things much easier on yourself by putting your data in Arrow arrays and using the parquet::arrow APIs. If you want to write the data using the lower-level Parquet column writer API, you will have to be careful with the repetition/definition levels. In your case, I believe the values you write need to have definition level 2 (the repeated node and optional node both increment the definition level by 1). I find this blog helpful for this https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet.html. There is also the Google Dremel paper - Wes On Fri, Dec 8, 2017 at 6:19 PM, Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com> wrote: > Thanks Wes! So I create it this way, but I still don't know how to populate > and > > auto element = PrimitiveNode::Make("element", Repetition::OPTIONAL, > Type::INT32); > auto list = GroupNode::Make("list", Repetition::REPEATED, {element}); > auto my_array = GroupNode::Make("my_array", Repetition::REQUIRED, {list}, > LogicalType::LIST); > fields.push_back(PrimitiveNode::Make("id", Repetition::REQUIRED, > Type::INT32, LogicalType::NONE)); > fields.push_back(my_array); > auto my_schema = GroupNode::Make("schema", Repetition::REQUIRED, fields); > > I tried populating it this way: > > parquet::Int32Writer* int32_writer1 = > static_cast<parquet::Int32Writer*>(rg_writer->NextColumn()); > for (int i = 0; i < NROWS_GROUP; i++) { > int32_t value = i; > int16_t definition_level = 1; > int16_t repetition_level = 0; > if ((i+1)%2 == 0) { > repetition_level = 1; // start of a new record > } > int32_writer1->WriteBatch(1, &definition_level, &repetition_level, > &value); > } > > That seems to work, but I can't use the generated file on Athena and using > the parquet_reader from parquet_cpp returns NULLs on the elements. Is it > that I have to get a handle to the list element? Thanks again for the help! >