[
https://issues.apache.org/jira/browse/FLINK-33817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812045#comment-17812045
]
Sai Sharath Dandi commented on FLINK-33817:
-------------------------------------------
[~libenchao] [~maosuhan] , Gentle ping on this ticket as there is a very high
performance impact here for the Protobuf format.
> Allow ReadDefaultValues = False for non primitive types on Proto3
> -----------------------------------------------------------------
>
> Key: FLINK-33817
> URL: https://issues.apache.org/jira/browse/FLINK-33817
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.18.0
> Reporter: Sai Sharath Dandi
> Priority: Major
> Labels: pull-request-available
>
> *Background*
>
> The current Protobuf format
> [implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java]
> always sets ReadDefaultValues=False when using Proto3 version. This can
> cause severe performance degradation for large Protobuf schemas with OneOf
> fields as the entire generated code needs to be executed during
> deserialization even when certain fields are not present in the data to be
> deserialized and all the subsequent nested Fields can be skipped. Proto3
> supports hasXXX() methods for checking field presence for non primitive types
> since Proto version
> [3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In
> the internal performance benchmarks in our company, we've seen almost 10x
> difference in performance for one of our real production usecase when
> allowing to set ReadDefaultValues=False with proto3 version. The exact
> difference in performance depends on the schema complexity and data payload
> but we should allow user to set readDefaultValue=False in general.
>
> *Solution*
>
> Support using ReadDefaultValues=False when using Proto3 version. We need to
> be careful to check for field presence only on non-primitive types if
> ReadDefaultValues is false and version used is Proto3
--
This message was sent by Atlassian Jira
(v8.20.10#820010)