I played around with the code and found a simple, maybe too simple, solution and opened a PR. Fingers crossed.
On Tue, Sep 29, 2020 at 10:55 AM Aaron Niskode-Dossett < [email protected]> wrote: > Thank you, David, I agree with your conclusions. I opened PARQUET-1917. > > On Tue, Sep 29, 2020 at 10:18 AM David <[email protected]> wrote: > >> Hello, >> >> Perhaps a bit more nuance here. I believe that the values are technically >> correct (they should be the default value of 0), but we should not be >> storing them as 0 values. We need to check the hasBar*() to determine if >> the value should be stored or omitted. >> >> Thanks. >> >> On Tue, Sep 29, 2020 at 10:39 AM David <[email protected]> wrote: >> >> > Hello, >> > >> > I too have been poking around the Parquet-Proto package as well. >> > >> > I would expect "bar_int" and "bar_int2" to be 'null' here. >> > >> > Have you filed a JIRA with this reproduction? >> > >> > Thanks. >> > >> > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett >> > <[email protected]> wrote: >> > >> >> Hello, >> >> >> >> I am experimenting with serializing protobuf3 to parquet and have a >> >> question about how "oneOf" fields should be treated. I will describe >> an >> >> example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That >> JIRA >> >> is about how default values are written out, and seems related to my >> >> question. >> >> >> >> SCHEMA >> >> -------- >> >> message Person { >> >> int32 foo = 1; >> >> oneof optional_bar { >> >> int32 bar_int = 200; >> >> int32 bar_int2 = 201; >> >> string bar_string = 300; >> >> } >> >> } >> >> >> >> CODE >> >> -------- >> >> I set values for foo and bar_string >> >> >> >> for (int i = 0; i < 3; i += 1) { >> >> com.etsy.grpcparquet.Person message = >> Person.newBuilder() >> >> .setFoo(i) >> >> .setBarString("hello world") >> >> .build(); >> >> message.writeDelimitedTo(out); >> >> } >> >> And then I write the protobuf file out to parquet. >> >> >> >> RESULT >> >> ----------- >> >> $ parquet-tools show example.parquet >> >> >> >> >> >> +-------+-----------+------------+--------------+ >> >> | foo | bar_int | bar_int2 | bar_string | >> >> |-------+-----------+------------+--------------| >> >> | 0 | 0 | 0 | hello world | >> >> | 1 | 0 | 0 | hello world | >> >> | 2 | 0 | 0 | hello world | >> >> +-------+-----------+------------+--------------+ >> >> >> >> I would expect that bar_int and bar_int2 are EMPTY for all three rows >> >> since >> >> only bar_string is set in the oneof. >> >> >> >> Is this the right expectation for me to have? >> >> >> >> Thank you! >> >> >> >> -- >> >> Aaron Niskode-Dossett, Data Engineering -- Etsy >> >> >> > >> > > > -- > Aaron Niskode-Dossett, Data Engineering -- Etsy > -- Aaron Niskode-Dossett, Data Engineering -- Etsy
