luchunliang opened a new issue, #12141: URL: https://github.com/apache/inlong/issues/12141
### What happened What happened When using transform-sdk to decode Protobuf messages and write to typed sinks (Iceberg/Parquet via RowData), unset fields are incorrectly output as protobuf default values (0, "", false, empty list) instead of null. This causes downstream sinks to write wrong data — for example, an unset int64 field appears as 0 in Iceberg rather than NULL. Additionally, when upstream protobuf messages have missing required fields (common in production with schema evolution), the decoder throws UninitializedMessageException and drops the entire record. Root Cause In protobuf-java, DynamicMessage.getField(fieldDesc) never returns null for unset fields — it returns the type's default value. The code must call hasField() before getField() to distinguish "field is set to 0" from "field is not set". Multiple locations in PbSourceData and PbSourceDecoder are missing this check. Impact Data correctness: Unset numeric fields (int/long/float/double) are written as 0 instead of NULL in Iceberg — indistinguishable from a legitimately set zero value Data loss: Messages with missing proto2 required fields cause UninitializedMessageException, entire record silently dropped Null semantics broken: transformForBytes() converts null field values to empty string "", preventing downstream RowData encoders from emitting proper NULL values Fix PbSourceDecoder.decode(): Use buildPartial() instead of build() to tolerate missing required fields PbSourceData.buildStructData(): Add hasField() check before getField() for non-repeated fields PbSourceData.findNodeValue(): Add hasField() check before getField() for non-repeated fields PbSourceData.buildMapData() / parseMapNode(): Add hasField() check for map entry key/value TransformProcessor.transformForBytes(): Pass null instead of "" when field value is null, preserving null semantics for binary sinks Affected Versions inlong-sdk/transform-sdk (all versions up to current master) ### What you expected to happen What happened When using transform-sdk to decode Protobuf messages and write to typed sinks (Iceberg/Parquet via RowData), unset fields are incorrectly output as protobuf default values (0, "", false, empty list) instead of null. This causes downstream sinks to write wrong data — for example, an unset int64 field appears as 0 in Iceberg rather than NULL. Additionally, when upstream protobuf messages have missing required fields (common in production with schema evolution), the decoder throws UninitializedMessageException and drops the entire record. Root Cause In protobuf-java, DynamicMessage.getField(fieldDesc) never returns null for unset fields — it returns the type's default value. The code must call hasField() before getField() to distinguish "field is set to 0" from "field is not set". Multiple locations in PbSourceData and PbSourceDecoder are missing this check. Impact Data correctness: Unset numeric fields (int/long/float/double) are written as 0 instead of NULL in Iceberg — indistinguishable from a legitimately set zero value Data loss: Messages with missing proto2 required fields cause UninitializedMessageException, entire record silently dropped Null semantics broken: transformForBytes() converts null field values to empty string "", preventing downstream RowData encoders from emitting proper NULL values Fix PbSourceDecoder.decode(): Use buildPartial() instead of build() to tolerate missing required fields PbSourceData.buildStructData(): Add hasField() check before getField() for non-repeated fields PbSourceData.findNodeValue(): Add hasField() check before getField() for non-repeated fields PbSourceData.buildMapData() / parseMapNode(): Add hasField() check for map entry key/value TransformProcessor.transformForBytes(): Pass null instead of "" when field value is null, preserving null semantics for binary sinks Affected Versions inlong-sdk/transform-sdk (all versions up to current master) ### How to reproduce What happened When using transform-sdk to decode Protobuf messages and write to typed sinks (Iceberg/Parquet via RowData), unset fields are incorrectly output as protobuf default values (0, "", false, empty list) instead of null. This causes downstream sinks to write wrong data — for example, an unset int64 field appears as 0 in Iceberg rather than NULL. Additionally, when upstream protobuf messages have missing required fields (common in production with schema evolution), the decoder throws UninitializedMessageException and drops the entire record. Root Cause In protobuf-java, DynamicMessage.getField(fieldDesc) never returns null for unset fields — it returns the type's default value. The code must call hasField() before getField() to distinguish "field is set to 0" from "field is not set". Multiple locations in PbSourceData and PbSourceDecoder are missing this check. Impact Data correctness: Unset numeric fields (int/long/float/double) are written as 0 instead of NULL in Iceberg — indistinguishable from a legitimately set zero value Data loss: Messages with missing proto2 required fields cause UninitializedMessageException, entire record silently dropped Null semantics broken: transformForBytes() converts null field values to empty string "", preventing downstream RowData encoders from emitting proper NULL values Fix PbSourceDecoder.decode(): Use buildPartial() instead of build() to tolerate missing required fields PbSourceData.buildStructData(): Add hasField() check before getField() for non-repeated fields PbSourceData.findNodeValue(): Add hasField() check before getField() for non-repeated fields PbSourceData.buildMapData() / parseMapNode(): Add hasField() check for map entry key/value TransformProcessor.transformForBytes(): Pass null instead of "" when field value is null, preserving null semantics for binary sinks Affected Versions inlong-sdk/transform-sdk (all versions up to current master) ### Environment _No response_ ### InLong version master ### InLong Component InLong SDK ### Are you willing to submit PR? - [x] Yes, I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
