sv3ndk edited a comment on pull request #14376:
URL: https://github.com/apache/flink/pull/14376#issuecomment-809618142


   Thanks a lot for working on a Protobuf format! We also use that 
serialization mechanism at my company and will need such format  soon, this is 
very useful.
   
   I have a different understanding regarding the handling of missing values by 
protobuf 3 and I believe we should be able to provide Fllink with nulls in case 
of missing value in pb. @maosuhan, if you want I'd be happy to collaborate with 
you on this and add support for this case.
   
   As I understand it:
   
   * protobuf 3 considers every field as optional
   * if a pb field is a complex type, the generated java code contains a 
`hasXYZ()` method to determine if that field is present
   * if a pb field is a scalar, no such method gets generated
   * when a pb field value is not specified in a pb instance, the `getXYZ()` 
method returns the default value (i.e `""` instead of `null` in case of string)
   
   The way we approach this where I work is:
   * we only only use protobuf scalar types for non nullable fields
   * we rely on wrappers (like `google.protobuf.StringValue`) for nullable java 
primitives
   * when reading data, we always check `hasXYZ()` before calling `getXYZ()`
   
   Here's a quick example, generating java classes with pb `3.15.0`:
   
   Given this schema:
   ```
   syntax = "proto3";
   import "google/protobuf/wrappers.proto";
   
   message Parent {
     string id = 1;
     google.protobuf.StringValue name = 2;
     Inner innerValue = 3;
   
     message Inner {
       string f1 = 1;
       string f2 = 2;
     }
   }
   ```
   
   and this `roundTrip()` method:
   
   ```java
     Parent roundTrip(Parent parent) throws InvalidProtocolBufferException {
       return Parent.parser().parseFrom(parent.toByteArray());
     }
   ```
   
   Those assertions show that the missing `name` field can be correctly 
interpreted both before and after serialization:
   
   ```java
    var withoutName = Parent.newBuilder()
           .setId("theId")
           .setInnerValue(
               Parent.Inner.newBuilder()
                   .setF1("theF1")
                   .setF2("theF1")
                   .build()
           )
           .build();
   
       assert ! withoutName.hasName();
       assert ! roundTrip(withoutName).hasName();
       assert withoutName.hasInnerValue();
       assert roundTrip(withoutName).hasInnerValue();
       assert withoutName.getInnerValue().getF1().equals("theF1");
       assert roundTrip(withoutName).getInnerValue().getF1().equals("theF1");
       //assert ! hasNoName.hasId();           // does not compile: hasXYZ() 
does not exist if XYZ is a scalar
   ```
   
   Similarly, this instance with a missing nested field can be interpreted 
correctly as well by a reader:
   
   ```java
    var withoutInner = Parent.newBuilder()
           .setId("theId")
           .setName(StringValue.newBuilder().setValue("theName").build())
           .build();
   
       assert ! withoutInner.hasInnerValue();
       assert ! roundTrip(withoutInner).hasInnerValue();
       assert withoutInner.hasName();
       assert roundTrip(withoutInner).hasName();
       assert withoutInner.getName().getValue().equals("theName");
       assert roundTrip(withoutInner).getName().getValue().equals("theName");
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to