Hello,

Perhaps a bit more nuance here.  I believe that the values are technically
correct (they should be the default value of 0), but we should not be
storing them as 0 values.  We need to check the hasBar*() to determine if
the value should be stored or omitted.

Thanks.

On Tue, Sep 29, 2020 at 10:39 AM David <[email protected]> wrote:

> Hello,
>
> I too have been poking around the Parquet-Proto package as well.
>
> I would expect "bar_int" and "bar_int2" to be 'null' here.
>
> Have you filed a JIRA with this reproduction?
>
> Thanks.
>
> On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett
> <[email protected]> wrote:
>
>> Hello,
>>
>> I am experimenting with serializing protobuf3 to parquet and have a
>> question about how "oneOf" fields should be treated.  I will describe an
>> example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That JIRA
>> is about how default values are written out, and seems related to my
>> question.
>>
>> SCHEMA
>> --------
>> message Person {
>>   int32 foo = 1;
>>   oneof optional_bar {
>>     int32 bar_int = 200;
>>     int32 bar_int2 = 201;
>>     string bar_string = 300;
>>   }
>> }
>>
>> CODE
>> --------
>> I set values for foo and bar_string
>>
>> for (int i = 0; i < 3; i += 1) {
>>                 com.etsy.grpcparquet.Person message = Person.newBuilder()
>>                         .setFoo(i)
>>                         .setBarString("hello world")
>>                         .build();
>>                 message.writeDelimitedTo(out);
>>             }
>> And then I write the protobuf file out to parquet.
>>
>> RESULT
>> -----------
>> $ parquet-tools show example.parquet
>>
>>
>> +-------+-----------+------------+--------------+
>> |   foo |   bar_int |   bar_int2 | bar_string   |
>> |-------+-----------+------------+--------------|
>> |     0 |         0 |          0 | hello world  |
>> |     1 |         0 |          0 | hello world  |
>> |     2 |         0 |          0 | hello world  |
>> +-------+-----------+------------+--------------+
>>
>> I would expect that bar_int and bar_int2 are EMPTY for all three rows
>> since
>> only bar_string is set in the oneof.
>>
>> Is this the right expectation for me to have?
>>
>> Thank you!
>>
>> --
>> Aaron Niskode-Dossett, Data Engineering -- Etsy
>>
>

Reply via email to