sv3ndk commented on pull request #14376:
URL: https://github.com/apache/flink/pull/14376#issuecomment-814650640


   Thanks for the feed-back @maosuhan ,
   
   Indeed, I see what you mean, I think you're correct and I'm missing one case.
   
   If I understand things correctly, the logic we want is, when deserializing a 
pb scalar field into a Fink row field:
   
   1. pb2:
      1.a. if the pb scalar field is specified => set the Flink row field to 
that value
      1.b. if the pb scalar field is not specified but has a default value => 
set the Flink row field to that default
      1.c. if the pb scalar field is not specified, has no default value and is 
optional => set the Flink row field to null
      1.d. if the pb scalar field is not specified, has no default value and is 
not optional => set the Flink row field to JVM default
   2. pb3:
      2.a. if the scalar pb field is specifed => set the Flink row field to 
that value
      2.b. (not applicable: there is no equivalent of 1.b in pb3)
      2.c. (not applicable: scalar field are never optional in pb3 since the 
java generated class always returns a value and no `hasXYZ()` is present in 
that case)
      2.d. if the pb field is not specified=> set the Flink row field to JVM 
default
   
   This logic is, I think, the closest we can get to the way the generated 
classes themselves behaves, s.t. most of it should be achievable with as little 
logic as possible in the Flink format, by simply delegating to the generated 
classes. I admit using "JVM default" is surprising, although that's how 
protobuf generated classes behaves. My understanding is that the author of a 
protobuf schema would never use a non optional scalar field for communicating a 
nullable value, precisely because of that behavior.
   
   At the moment, I think the code in my MR supports all those cases except 1.c 
(I just wrote a UT for that case and indeed it fails). That case has no 
equivalent in "pure protobuf 2" since in that case the generated java class 
contains a primitive field (which cannot be null), although as you pointed out 
I should use `hasXYZ()` in order to set null in the Flink row.
   
   Here is what I suggest:
   
   * I add a couple of UT that make explicit that omission in my code 
   * I fix the bug for case 1.c, by adding a condition in 
`isMessageNonEmptyStr()` 
   * I regroup the UT into `proto2` and `proto3` folders s.t. it's easier to 
see which cases are covered in which versions
   * If it's ok for you, I suggest I remove the `readDefaultValues` flag, and 
we apply the logic I listed above in all cases, in order to keep the behavior 
as close as possible to the nominal protobuf behavior
   
   I'll wait for your feed-back before coding anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to