[ 
https://issues.apache.org/jira/browse/FLINK-38144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087901#comment-18087901
 ] 

Mukul Gupta commented on FLINK-38144:
-------------------------------------

Hi, I'd like to work on this issue.

Problem: ChangelogMapState.put(key, null) and entry.setValue(null) both call 
getWriter(key, value) which directly invokes valueSerializer.serialize(null, 
out) — causing NPE since most Flink serializers (IntSerializer, 
StringSerializer, etc.) are
null-unsafe. The same issue exists in both call sites of getWriter() in 
ChangelogMapState.java.

Previously proposed approaches (from PR #26831 discussion):

1. New StateMetaInfoReader version encoding null-tolerance — over-engineered, 
affects all backends
2. New null-tolerant StateChangeOperation alongside ADD_OR_UPDATE_ELEMENT — 
backward compat risk on downgrade
3. Ignore the problem

All were deferred because options 1 and 2 "break the serialization format and 
introduce migration issues."

Proposed fix: Route put(key, null) and entry.setValue(null) through the 
existing ADD operation path (used by putAll()), by calling 
changeLogger.valueAdded(Collections.singletonMap(key, null), ns). The ADD path 
serializes via MapSerializer.serialize()
 which already has a boolean null-flag protocol, and on restore 
MapStateChangeApplier handles ADD by calling 
mapState.putAll(mapSerializer.deserialize(in)) which correctly reconstructs 
null values.

Why this avoids the previously raised concerns:

- No serialization format change — ADD operation and MapSerializer null 
protocol already exist
- No new StateChangeOperation enum value needed
- No migration — old changelogs never contain null entries (they crashed before 
writing), and old ADD_OR_UPDATE_ELEMENT entries for non-null values still 
restore via the unchanged path
- Semantic equivalence confirmed — putAll(singletonMap(key, null)) is identical 
to put(key, null) (both merge/additive), verified across Heap, RocksDB backends
- TTL, metrics, materialization, thread safety all unaffected

If analysis looks good, Could this be assigned to me?
[~fanrui] can you review

> Change log state backend is not compatible with the case where the user value 
> of the map state is null 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-38144
>                 URL: https://issues.apache.org/jira/browse/FLINK-38144
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / State Backends
>            Reporter: Rui Fan
>            Priority: Minor
>
> change log state backend is not compatible with the case where the user value 
> of the map state is null. 
> Code path: 
> [https://github.com/apache/flink/blob/31785e076c86d0a44e3f4a17f44a04908a2d3eb4/flink-state-backends/flink-statebackend-changelog/src/main/java/org/apache/flink/state/changelog/ChangelogMapState.java#L224]
>  
> Get more details from: 
> [https://github.com/apache/flink/pull/26831#discussion_r2230455833]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to