Hello, Ying

>From my own experience, the proposal seems interesting. So to give some
more context about this "protobuf wrapper" for people that are not familiar
with it: protobuf3 drops support for "null" semantics for primitives both
in its wire format and in its API, for people that wish to have nullable
fields, they provide the "wrapper" to nest the primitive fields in some
struct. The current parquet-protobuf implementation is converting protobuf
schema to parquet schema in a loyal way, so that all the wrappers will
become an intermediate struct in parquet field path. Denesting those
wrappers should make the parquet file (schema) easier to use.
In the meantime, it seems to me the proposal is more focused on the
writing. Maybe it is worth to think about how to make reading
backward/forward compatible.

cc @lukasnalezenec @zivanfi @rdblue

Best regards,


Le ven. 14 juin 2019 à 02:42, ying <[email protected]> a écrit :

> Dear Parquet community:
>
> We are working on a data pipeline which takes on protobuf data and write in
> Parquet. Currently we take advantage of the Parquet proto writer support
> <https://github.com/apache/parquet-mr/tree/master/parquet-protobuf>.
>
> While the existing Parquet protobuf writer preserves all the message
> structure of a Protobuf definition, in our case users often prefer
> de-nesting the protobuf wrappers classes and filling in the same field with
> simply its "value" data.  We have implemented some basic functionality to
> achieve this, on top of the existing Parquet-proto writer. For details,
> please refer to Parquet-1595
> <https://issues.apache.org/jira/browse/PARQUET-1595> .
>
> We would like to solicit comments, and would be happy to contribute if the
> community thinks it is a sound idea to pursue.  Any comments or pointers to
> related prior discussions are welcome.
>
> Thanks!
>
> -
> Ying
>

Reply via email to