[
https://issues.apache.org/jira/browse/FLINK-38144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087901#comment-18087901
]
Mukul Gupta edited comment on FLINK-38144 at 6/11/26 4:12 AM:
--------------------------------------------------------------
Hi, I'd like to work on this issue.
Problem: ChangelogMapState.put(key, null) and entry.setValue(null) both call
getWriter(key, value) which directly invokes valueSerializer.serialize(null,
out) — causing NPE since most Flink serializers (IntSerializer, etc.) are
null-unsafe. The same issue exists in both call sites of getWriter() in
ChangelogMapState.java.
Previously proposed approaches (from PR #26831 discussion):
1. New StateMetaInfoReader version encoding null-tolerance — over-engineered,
affects all backends
2. New null-tolerant StateChangeOperation alongside ADD_OR_UPDATE_ELEMENT —
backward compat risk on downgrade
3. Ignore the problem
All were deferred because options 1 and 2 "break the serialization format and
introduce migration issues."
Proposed fix: Route put(key, null) and entry.setValue(null) through the
existing ADD operation path (used by putAll()), by calling
changeLogger.valueAdded(Collections.singletonMap(key, null), ns). The ADD path
serializes via MapSerializer.serialize()
which already has a boolean null-flag protocol, and on restore
MapStateChangeApplier handles ADD by calling
mapState.putAll(mapSerializer.deserialize(in)) which correctly reconstructs
null values.
Why this avoids the previously raised concerns:
- No serialization format change — ADD operation and MapSerializer null
protocol already exist
- No new StateChangeOperation enum value needed
- No migration — old changelogs never contain null entries (they crashed before
writing), and old ADD_OR_UPDATE_ELEMENT entries for non-null values still
restore via the unchanged path
- Semantic equivalence confirmed — putAll(singletonMap(key, null)) is identical
to put(key, null) (both merge/additive), verified across Heap, RocksDB backends
- TTL, metrics, materialization, thread safety all unaffected
If analysis looks good, Could this be assigned to me?
[~fanrui] [~raminqaf] can you review
was (Author: JIRAUSER312410):
Hi, I'd like to work on this issue.
Problem: ChangelogMapState.put(key, null) and entry.setValue(null) both call
getWriter(key, value) which directly invokes valueSerializer.serialize(null,
out) — causing NPE since most Flink serializers (IntSerializer,
StringSerializer, etc.) are
null-unsafe. The same issue exists in both call sites of getWriter() in
ChangelogMapState.java.
Previously proposed approaches (from PR #26831 discussion):
1. New StateMetaInfoReader version encoding null-tolerance — over-engineered,
affects all backends
2. New null-tolerant StateChangeOperation alongside ADD_OR_UPDATE_ELEMENT —
backward compat risk on downgrade
3. Ignore the problem
All were deferred because options 1 and 2 "break the serialization format and
introduce migration issues."
Proposed fix: Route put(key, null) and entry.setValue(null) through the
existing ADD operation path (used by putAll()), by calling
changeLogger.valueAdded(Collections.singletonMap(key, null), ns). The ADD path
serializes via MapSerializer.serialize()
which already has a boolean null-flag protocol, and on restore
MapStateChangeApplier handles ADD by calling
mapState.putAll(mapSerializer.deserialize(in)) which correctly reconstructs
null values.
Why this avoids the previously raised concerns:
- No serialization format change — ADD operation and MapSerializer null
protocol already exist
- No new StateChangeOperation enum value needed
- No migration — old changelogs never contain null entries (they crashed before
writing), and old ADD_OR_UPDATE_ELEMENT entries for non-null values still
restore via the unchanged path
- Semantic equivalence confirmed — putAll(singletonMap(key, null)) is identical
to put(key, null) (both merge/additive), verified across Heap, RocksDB backends
- TTL, metrics, materialization, thread safety all unaffected
If analysis looks good, Could this be assigned to me?
[~fanrui] [~raminqaf] can you review
> Change log state backend is not compatible with the case where the user value
> of the map state is null
> -------------------------------------------------------------------------------------------------------
>
> Key: FLINK-38144
> URL: https://issues.apache.org/jira/browse/FLINK-38144
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing, Runtime / State Backends
> Reporter: Rui Fan
> Priority: Minor
>
> change log state backend is not compatible with the case where the user value
> of the map state is null.
> Code path:
> [https://github.com/apache/flink/blob/31785e076c86d0a44e3f4a17f44a04908a2d3eb4/flink-state-backends/flink-statebackend-changelog/src/main/java/org/apache/flink/state/changelog/ChangelogMapState.java#L224]
>
> Get more details from:
> [https://github.com/apache/flink/pull/26831#discussion_r2230455833]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)