Didn't realize this question was on the Arrow mailing list instead of
the Parquet mailing list!

You can make things much easier on yourself by putting your data in
Arrow arrays and using the parquet::arrow APIs.

If you want to write the data using the lower-level Parquet column
writer API, you will have to be careful with the repetition/definition
levels. In your case, I believe the values you write need to have
definition level 2 (the repeated node and optional node both increment
the definition level by 1).

I find this blog helpful for this
https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet.html.
There is also the Google Dremel paper

- Wes

On Fri, Dec 8, 2017 at 6:19 PM, Renato Marroquín Mogrovejo
<renatoj.marroq...@gmail.com> wrote:
> Thanks Wes! So I create it this way, but I still don't know how to populate
> and
>
> auto element = PrimitiveNode::Make("element", Repetition::OPTIONAL,
> Type::INT32);
> auto list = GroupNode::Make("list", Repetition::REPEATED, {element});
> auto my_array = GroupNode::Make("my_array", Repetition::REQUIRED, {list},
> LogicalType::LIST);
> fields.push_back(PrimitiveNode::Make("id", Repetition::REQUIRED,
> Type::INT32, LogicalType::NONE));
> fields.push_back(my_array);
> auto my_schema = GroupNode::Make("schema", Repetition::REQUIRED, fields);
>
> I tried populating it this way:
>
>        parquet::Int32Writer* int32_writer1 =
> static_cast<parquet::Int32Writer*>(rg_writer->NextColumn());
>        for (int i = 0; i < NROWS_GROUP; i++) {
>          int32_t value = i;
>          int16_t definition_level = 1;
>          int16_t repetition_level = 0;
>          if ((i+1)%2 == 0) {
>            repetition_level = 1;  // start of a new record
>          }
>          int32_writer1->WriteBatch(1, &definition_level, &repetition_level,
> &value);
>       }
>
> That seems to work, but I can't use the generated file on Athena and using
> the parquet_reader from parquet_cpp returns NULLs on the elements. Is it
> that I have to get a handle to the list element? Thanks again for the help!
>

Reply via email to