[ 
https://issues.apache.org/jira/browse/FLINK-37528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Sharath Dandi updated FLINK-37528:
--------------------------------------
    Description: 
The Read Default Values is 
[forced|https://github.com/apache/flink/blob/master/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java#L74]
 to be true for primitive types in proto3. This can cause bugs in some cases 
for messages like below
{code:java}
oneof test {
      string aa = 1;
      int32 bb = 2;
      bool cc = 3;
      Corpus dd = 4;
    } {code}
Even if a only the first field is set in the oneOf, reading default values 
makes it so that all the fields are non-null after decoding. When such data is 
encoded back to protobuf, it will produce a different protobuf message than the 
original and cause data correctness issues.

solution:
{code:java}
if (PbFormatUtils.isSimpleType(subType) && !(elementFd.getContainingOneof() != 
null || elementFd.hasOptionalKeyword())) {
        readDefaultValues = 
formatContext.isReadDefaultValuesForPrimitiveTypes();
      } {code}
For primitive types in proto3, we can still do field presence checks when it is 
defined an optional field ( introduced in proto 3.15) or it is part of a oneOf 
message.

 

  was:
The Read Default Values is 
[forced|https://github.com/apache/flink/blob/master/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java#L74]
 to be true for primitive types in proto3. This can cause bugs in some cases 
for messages like below


{code:java}
oneof test {
      string aa = 1;
      int32 bb = 2;
      bool cc = 3;
      Corpus dd = 4;
    } {code}
Even if a only the first field is set in the oneOf, reading default values 
makes it so that all the fields are non-null after decoding. When such data is 
encoded back to protobuf, it will produce a different protobuf message than the 
original and cause data correctness issues.


solution:
{code:java}
if (PbFormatUtils.isSimpleType(subType) && !(elementFd.getContainingOneof() != 
null || elementFd.hasOptionalKeyword())) {
        readDefaultValues = 
formatContext.isReadDefaultValuesForPrimitiveTypes();
      } {code}
For primitive types in proto3, we can still do field presence checks when it is 
defined an optional field or it is part of a oneOf message.

 


> Protobuf Format (proto3): Handle default values for optional primitive types 
> and primitive types in one of fields 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-37528
>                 URL: https://issues.apache.org/jira/browse/FLINK-37528
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 2.0-preview
>            Reporter: Sai Sharath Dandi
>            Priority: Minor
>
> The Read Default Values is 
> [forced|https://github.com/apache/flink/blob/master/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java#L74]
>  to be true for primitive types in proto3. This can cause bugs in some cases 
> for messages like below
> {code:java}
> oneof test {
>       string aa = 1;
>       int32 bb = 2;
>       bool cc = 3;
>       Corpus dd = 4;
>     } {code}
> Even if a only the first field is set in the oneOf, reading default values 
> makes it so that all the fields are non-null after decoding. When such data 
> is encoded back to protobuf, it will produce a different protobuf message 
> than the original and cause data correctness issues.
> solution:
> {code:java}
> if (PbFormatUtils.isSimpleType(subType) && !(elementFd.getContainingOneof() 
> != null || elementFd.hasOptionalKeyword())) {
>         readDefaultValues = 
> formatContext.isReadDefaultValuesForPrimitiveTypes();
>       } {code}
> For primitive types in proto3, we can still do field presence checks when it 
> is defined an optional field ( introduced in proto 3.15) or it is part of a 
> oneOf message.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to