[
https://issues.apache.org/jira/browse/FLINK-33817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799959#comment-17799959
]
Sai Sharath Dandi commented on FLINK-33817:
-------------------------------------------
Yes, it is supported for message fields as well (although I'm not exactly sure
which version). See this
[documentation|https://protobuf.dev/programming-guides/field_presence/#how-to-enable-explicit-presence-in-proto3]
for more details. In fact, the hasXXX() method is supported for Primitive
types also if defined as optional
> Allow ReadDefaultValues = False for non primitive types on Proto3
> -----------------------------------------------------------------
>
> Key: FLINK-33817
> URL: https://issues.apache.org/jira/browse/FLINK-33817
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.18.0
> Reporter: Sai Sharath Dandi
> Priority: Major
>
> *Background*
>
> The current Protobuf format
> [implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java]
> always sets ReadDefaultValues=False when using Proto3 version. This can
> cause severe performance degradation for large Protobuf schemas with OneOf
> fields as the entire generated code needs to be executed during
> deserialization even when certain fields are not present in the data to be
> deserialized and all the subsequent nested Fields can be skipped. Proto3
> supports hasXXX() methods for checking field presence for non primitive types
> since Proto version
> [3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In
> the internal performance benchmarks in our company, we've seen almost 10x
> difference in performance for one of our real production usecase when
> allowing to set ReadDefaultValues=False with proto3 version. The exact
> difference in performance depends on the schema complexity and data payload
> but we should allow user to set readDefaultValue=False in general.
>
> *Solution*
>
> Support using ReadDefaultValues=False when using Proto3 version. We need to
> be careful to check for field presence only on non-primitive types if
> ReadDefaultValues is false and version used is Proto3
--
This message was sent by Atlassian Jira
(v8.20.10#820010)