Hello,
I am experimenting with serializing protobuf3 to parquet and have a
question about how "oneOf" fields should be treated. I will describe an
example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That JIRA
is about how default values are written out, and seems related to my
question.
SCHEMA
--------
message Person {
int32 foo = 1;
oneof optional_bar {
int32 bar_int = 200;
int32 bar_int2 = 201;
string bar_string = 300;
}
}
CODE
--------
I set values for foo and bar_string
for (int i = 0; i < 3; i += 1) {
com.etsy.grpcparquet.Person message = Person.newBuilder()
.setFoo(i)
.setBarString("hello world")
.build();
message.writeDelimitedTo(out);
}
And then I write the protobuf file out to parquet.
RESULT
-----------
$ parquet-tools show example.parquet
+-------+-----------+------------+--------------+
| foo | bar_int | bar_int2 | bar_string |
|-------+-----------+------------+--------------|
| 0 | 0 | 0 | hello world |
| 1 | 0 | 0 | hello world |
| 2 | 0 | 0 | hello world |
+-------+-----------+------------+--------------+
I would expect that bar_int and bar_int2 are EMPTY for all three rows since
only bar_string is set in the oneof.
Is this the right expectation for me to have?
Thank you!
--
Aaron Niskode-Dossett, Data Engineering -- Etsy