[
https://issues.apache.org/jira/browse/SPARK-44001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-44001.
----------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Issue resolved by pull request 43767
[https://github.com/apache/spark/pull/43767]
> Improve parsing of well known wrapper types
> -------------------------------------------
>
> Key: SPARK-44001
> URL: https://issues.apache.org/jira/browse/SPARK-44001
> Project: Spark
> Issue Type: Improvement
> Components: Protobuf
> Affects Versions: 3.4.0
> Reporter: Parth Upadhyay
> Assignee: Parth Upadhyay
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Under `com.google.protobuf`, there are some well known wrapper types for
> primitives,
> [namely|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto],
> useful for distinguishing between absence of primitive fields and their
> default values, as well as for use within `google.protobuf.Any` types. These
> types are:
> {code}
> DoubleValue
> FloatValue
> Int64Value
> Uint64Value
> Int32Value
> Uint32Value
> BoolValue
> StringValue
> BytesValue
> {code}
> Currently, when we deserialize these from a serialized protobuf into a spark
> struct, we expand them as if they were normal messages. Concretely, if we have
> {code}
> syntax = "proto3";
> import "google/protobuf/wrappers.proto"
> message WktExample {
> google.protobuf.BoolValue bool_val = 1;
> google.protobuf.Int32Value int32_val = 2;
> }
> {code}
> And a message like
> {code}
> WktExample(true, 100)
> {code}
> Then the behavior today is to deserialize this as.
> {code}
> {"bool_val": {"value": true}, "int32_val": {"value": 100}}
> {code}
> This is quite difficult to work with and not in the spirit of the wrapper
> type, so it would be nice to deserialize as
> {code}
> {"bool_val": true, "int32_val": 100}
> {code}
> This is also the behavior by other popular deserialization libraries,
> including java protobuf util
> [Jsonformat|https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L904-L914]
> and golangs
> [jsonpb|https://github.com/gogo/protobuf/blob/master/jsonpb/jsonpb.go#L207-L214].
> So for consistency with other libraries and improved usability, I propose we
> deserialize well known types in this way.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]