Dear Parquet community:

We are working on a data pipeline which takes on protobuf data and write in
Parquet. Currently we take advantage of the Parquet proto writer support
<https://github.com/apache/parquet-mr/tree/master/parquet-protobuf>.

While the existing Parquet protobuf writer preserves all the message
structure of a Protobuf definition, in our case users often prefer
de-nesting the protobuf wrappers classes and filling in the same field with
simply its "value" data.  We have implemented some basic functionality to
achieve this, on top of the existing Parquet-proto writer. For details,
please refer to Parquet-1595
<https://issues.apache.org/jira/browse/PARQUET-1595> .

We would like to solicit comments, and would be happy to contribute if the
community thinks it is a sound idea to pursue.  Any comments or pointers to
related prior discussions are welcome.

Thanks!

-
Ying

Reply via email to