Thank you, David, I agree with your conclusions.  I opened PARQUET-1917.

On Tue, Sep 29, 2020 at 10:18 AM David <[email protected]> wrote:

> Hello,
>
> Perhaps a bit more nuance here.  I believe that the values are technically
> correct (they should be the default value of 0), but we should not be
> storing them as 0 values.  We need to check the hasBar*() to determine if
> the value should be stored or omitted.
>
> Thanks.
>
> On Tue, Sep 29, 2020 at 10:39 AM David <[email protected]> wrote:
>
> > Hello,
> >
> > I too have been poking around the Parquet-Proto package as well.
> >
> > I would expect "bar_int" and "bar_int2" to be 'null' here.
> >
> > Have you filed a JIRA with this reproduction?
> >
> > Thanks.
> >
> > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett
> > <[email protected]> wrote:
> >
> >> Hello,
> >>
> >> I am experimenting with serializing protobuf3 to parquet and have a
> >> question about how "oneOf" fields should be treated.  I will describe an
> >> example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That
> JIRA
> >> is about how default values are written out, and seems related to my
> >> question.
> >>
> >> SCHEMA
> >> --------
> >> message Person {
> >>   int32 foo = 1;
> >>   oneof optional_bar {
> >>     int32 bar_int = 200;
> >>     int32 bar_int2 = 201;
> >>     string bar_string = 300;
> >>   }
> >> }
> >>
> >> CODE
> >> --------
> >> I set values for foo and bar_string
> >>
> >> for (int i = 0; i < 3; i += 1) {
> >>                 com.etsy.grpcparquet.Person message =
> Person.newBuilder()
> >>                         .setFoo(i)
> >>                         .setBarString("hello world")
> >>                         .build();
> >>                 message.writeDelimitedTo(out);
> >>             }
> >> And then I write the protobuf file out to parquet.
> >>
> >> RESULT
> >> -----------
> >> $ parquet-tools show example.parquet
> >>
> >>
> >> +-------+-----------+------------+--------------+
> >> |   foo |   bar_int |   bar_int2 | bar_string   |
> >> |-------+-----------+------------+--------------|
> >> |     0 |         0 |          0 | hello world  |
> >> |     1 |         0 |          0 | hello world  |
> >> |     2 |         0 |          0 | hello world  |
> >> +-------+-----------+------------+--------------+
> >>
> >> I would expect that bar_int and bar_int2 are EMPTY for all three rows
> >> since
> >> only bar_string is set in the oneof.
> >>
> >> Is this the right expectation for me to have?
> >>
> >> Thank you!
> >>
> >> --
> >> Aaron Niskode-Dossett, Data Engineering -- Etsy
> >>
> >
>


-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy

Reply via email to