sv3ndk edited a comment on pull request #14376:
URL: https://github.com/apache/flink/pull/14376#issuecomment-809618142
Thanks a lot for working on a Protobuf format! We also use that
serialization mechanism at my company and will need such format soon, this is
very useful.
I have a different understanding regarding the handling of missing values by
protobuf 3 and I believe we should be able to provide Fllink with nulls in case
of missing value in pb. @maosuhan, if you want I'd be happy to collaborate with
you on this and add support for this case.
As I understand it:
* protobuf 3 considers every field as optional
* if a pb field is a complex type, the generated java code contains a
`hasXYZ()` method to determine if that field is present
* if a pb field is a scalar, no such method gets generated
* when a pb field value is not specified in a pb instance, the `getXYZ()`
method returns the default value (i.e `""` instead of `null` in case of string)
The way we approach this where I work is:
* we only only use protobuf scalar types for non nullable fields
* we rely on wrappers (like `google.protobuf.StringValue`) for nullable java
primitives
* when reading data, we always check `hasXYZ()` before calling `getXYZ()`
Here's a quick example, generating java classes with pb `3.15.0`:
Given this schema:
```
syntax = "proto3";
import "google/protobuf/wrappers.proto";
message Parent {
string id = 1;
google.protobuf.StringValue name = 2;
Inner innerValue = 3;
message Inner {
string f1 = 1;
string f2 = 2;
}
}
```
and this `roundTrip()` method:
```java
Parent roundTrip(Parent parent) throws InvalidProtocolBufferException {
return Parent.parser().parseFrom(parent.toByteArray());
}
```
Those assertions show that the missing `name` field can be correctly
interpreted both before and after serialization:
```java
var withoutName = Parent.newBuilder()
.setId("theId")
.setInnerValue(
Parent.Inner.newBuilder()
.setF1("theF1")
.setF2("theF1")
.build()
)
.build();
assert ! withoutName.hasName();
assert ! roundTrip(withoutName).hasName();
assert withoutName.hasInnerValue();
assert roundTrip(withoutName).hasInnerValue();
assert withoutName.getInnerValue().getF1().equals("theF1");
assert roundTrip(withoutName).getInnerValue().getF1().equals("theF1");
//assert ! hasNoName.hasId(); // does not compile: hasXYZ()
does not exist if XYZ is a scalar
```
Similarly, this instance with a missing nested field can be interpreted
correctly as well by a reader:
```java
var withoutInner = Parent.newBuilder()
.setId("theId")
.setName(StringValue.newBuilder().setValue("theName").build())
.build();
assert ! withoutInner.hasInnerValue();
assert ! roundTrip(withoutInner).hasInnerValue();
assert withoutInner.hasName();
assert roundTrip(withoutInner).hasName();
assert withoutInner.getName().getValue().equals("theName");
assert roundTrip(withoutInner).getName().getValue().equals("theName");
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]