I played around with the code and found a simple, maybe too simple,
solution and opened a PR.  Fingers crossed.

On Tue, Sep 29, 2020 at 10:55 AM Aaron Niskode-Dossett <
[email protected]> wrote:

> Thank you, David, I agree with your conclusions.  I opened PARQUET-1917.
>
> On Tue, Sep 29, 2020 at 10:18 AM David <[email protected]> wrote:
>
>> Hello,
>>
>> Perhaps a bit more nuance here.  I believe that the values are technically
>> correct (they should be the default value of 0), but we should not be
>> storing them as 0 values.  We need to check the hasBar*() to determine if
>> the value should be stored or omitted.
>>
>> Thanks.
>>
>> On Tue, Sep 29, 2020 at 10:39 AM David <[email protected]> wrote:
>>
>> > Hello,
>> >
>> > I too have been poking around the Parquet-Proto package as well.
>> >
>> > I would expect "bar_int" and "bar_int2" to be 'null' here.
>> >
>> > Have you filed a JIRA with this reproduction?
>> >
>> > Thanks.
>> >
>> > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett
>> > <[email protected]> wrote:
>> >
>> >> Hello,
>> >>
>> >> I am experimenting with serializing protobuf3 to parquet and have a
>> >> question about how "oneOf" fields should be treated.  I will describe
>> an
>> >> example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That
>> JIRA
>> >> is about how default values are written out, and seems related to my
>> >> question.
>> >>
>> >> SCHEMA
>> >> --------
>> >> message Person {
>> >>   int32 foo = 1;
>> >>   oneof optional_bar {
>> >>     int32 bar_int = 200;
>> >>     int32 bar_int2 = 201;
>> >>     string bar_string = 300;
>> >>   }
>> >> }
>> >>
>> >> CODE
>> >> --------
>> >> I set values for foo and bar_string
>> >>
>> >> for (int i = 0; i < 3; i += 1) {
>> >>                 com.etsy.grpcparquet.Person message =
>> Person.newBuilder()
>> >>                         .setFoo(i)
>> >>                         .setBarString("hello world")
>> >>                         .build();
>> >>                 message.writeDelimitedTo(out);
>> >>             }
>> >> And then I write the protobuf file out to parquet.
>> >>
>> >> RESULT
>> >> -----------
>> >> $ parquet-tools show example.parquet
>> >>
>> >>
>> >> +-------+-----------+------------+--------------+
>> >> |   foo |   bar_int |   bar_int2 | bar_string   |
>> >> |-------+-----------+------------+--------------|
>> >> |     0 |         0 |          0 | hello world  |
>> >> |     1 |         0 |          0 | hello world  |
>> >> |     2 |         0 |          0 | hello world  |
>> >> +-------+-----------+------------+--------------+
>> >>
>> >> I would expect that bar_int and bar_int2 are EMPTY for all three rows
>> >> since
>> >> only bar_string is set in the oneof.
>> >>
>> >> Is this the right expectation for me to have?
>> >>
>> >> Thank you!
>> >>
>> >> --
>> >> Aaron Niskode-Dossett, Data Engineering -- Etsy
>> >>
>> >
>>
>
>
> --
> Aaron Niskode-Dossett, Data Engineering -- Etsy
>


-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy

Reply via email to