Thank you, David, I agree with your conclusions. I opened PARQUET-1917. On Tue, Sep 29, 2020 at 10:18 AM David <[email protected]> wrote:
> Hello, > > Perhaps a bit more nuance here. I believe that the values are technically > correct (they should be the default value of 0), but we should not be > storing them as 0 values. We need to check the hasBar*() to determine if > the value should be stored or omitted. > > Thanks. > > On Tue, Sep 29, 2020 at 10:39 AM David <[email protected]> wrote: > > > Hello, > > > > I too have been poking around the Parquet-Proto package as well. > > > > I would expect "bar_int" and "bar_int2" to be 'null' here. > > > > Have you filed a JIRA with this reproduction? > > > > Thanks. > > > > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett > > <[email protected]> wrote: > > > >> Hello, > >> > >> I am experimenting with serializing protobuf3 to parquet and have a > >> question about how "oneOf" fields should be treated. I will describe an > >> example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That > JIRA > >> is about how default values are written out, and seems related to my > >> question. > >> > >> SCHEMA > >> -------- > >> message Person { > >> int32 foo = 1; > >> oneof optional_bar { > >> int32 bar_int = 200; > >> int32 bar_int2 = 201; > >> string bar_string = 300; > >> } > >> } > >> > >> CODE > >> -------- > >> I set values for foo and bar_string > >> > >> for (int i = 0; i < 3; i += 1) { > >> com.etsy.grpcparquet.Person message = > Person.newBuilder() > >> .setFoo(i) > >> .setBarString("hello world") > >> .build(); > >> message.writeDelimitedTo(out); > >> } > >> And then I write the protobuf file out to parquet. > >> > >> RESULT > >> ----------- > >> $ parquet-tools show example.parquet > >> > >> > >> +-------+-----------+------------+--------------+ > >> | foo | bar_int | bar_int2 | bar_string | > >> |-------+-----------+------------+--------------| > >> | 0 | 0 | 0 | hello world | > >> | 1 | 0 | 0 | hello world | > >> | 2 | 0 | 0 | hello world | > >> +-------+-----------+------------+--------------+ > >> > >> I would expect that bar_int and bar_int2 are EMPTY for all three rows > >> since > >> only bar_string is set in the oneof. > >> > >> Is this the right expectation for me to have? > >> > >> Thank you! > >> > >> -- > >> Aaron Niskode-Dossett, Data Engineering -- Etsy > >> > > > -- Aaron Niskode-Dossett, Data Engineering -- Etsy
