Hello,

I am experimenting with serializing protobuf3 to parquet and have a
question about how "oneOf" fields should be treated.  I will describe an
example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That JIRA
is about how default values are written out, and seems related to my
question.

SCHEMA
--------
message Person {
  int32 foo = 1;
  oneof optional_bar {
    int32 bar_int = 200;
    int32 bar_int2 = 201;
    string bar_string = 300;
  }
}

CODE
--------
I set values for foo and bar_string

for (int i = 0; i < 3; i += 1) {
                com.etsy.grpcparquet.Person message = Person.newBuilder()
                        .setFoo(i)
                        .setBarString("hello world")
                        .build();
                message.writeDelimitedTo(out);
            }
And then I write the protobuf file out to parquet.

RESULT
-----------
$ parquet-tools show example.parquet


+-------+-----------+------------+--------------+
|   foo |   bar_int |   bar_int2 | bar_string   |
|-------+-----------+------------+--------------|
|     0 |         0 |          0 | hello world  |
|     1 |         0 |          0 | hello world  |
|     2 |         0 |          0 | hello world  |
+-------+-----------+------------+--------------+

I would expect that bar_int and bar_int2 are EMPTY for all three rows since
only bar_string is set in the oneof.

Is this the right expectation for me to have?

Thank you!

-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy

Reply via email to