sv3ndk commented on pull request #14376:
URL: https://github.com/apache/flink/pull/14376#issuecomment-814650640
Thanks for the feed-back @maosuhan ,
Indeed, I see what you mean, I think you're correct and I'm missing one case.
If I understand things correctly, the logic we want is, when deserializing a
pb scalar field into a Fink row field:
1. pb2:
1.a. if the pb scalar field is specified => set the Flink row field to
that value
1.b. if the pb scalar field is not specified but has a default value =>
set the Flink row field to that default
1.c. if the pb scalar field is not specified, has no default value and is
optional => set the Flink row field to null
1.d. if the pb scalar field is not specified, has no default value and is
not optional => set the Flink row field to JVM default
2. pb3:
2.a. if the scalar pb field is specifed => set the Flink row field to
that value
2.b. (not applicable: there is no equivalent of 1.b in pb3)
2.c. (not applicable: scalar field are never optional in pb3 since the
java generated class always returns a value and no `hasXYZ()` is present in
that case)
2.d. if the pb field is not specified=> set the Flink row field to JVM
default
This logic is, I think, the closest we can get to the way the generated
classes themselves behaves, s.t. most of it should be achievable with as little
logic as possible in the Flink format, by simply delegating to the generated
classes. I admit using "JVM default" is surprising, although that's how
protobuf generated classes behaves. My understanding is that the author of a
protobuf schema would never use a non optional scalar field for communicating a
nullable value, precisely because of that behavior.
At the moment, I think the code in my MR supports all those cases except 1.c
(I just wrote a UT for that case and indeed it fails). That case has no
equivalent in "pure protobuf 2" since in that case the generated java class
contains a primitive field (which cannot be null), although as you pointed out
I should use `hasXYZ()` in order to set null in the Flink row.
Here is what I suggest:
* I add a couple of UT that make explicit that omission in my code
* I fix the bug for case 1.c, by adding a condition in
`isMessageNonEmptyStr()`
* I regroup the UT into `proto2` and `proto3` folders s.t. it's easier to
see which cases are covered in which versions
* If it's ok for you, I suggest I remove the `readDefaultValues` flag, and
we apply the logic I listed above in all cases, in order to keep the behavior
as close as possible to the nominal protobuf behavior
I'll wait for your feed-back before coding anything.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]