[jira] [Commented] (FLINK-24815) Reduce the cpu cost of calculating stateSize during state allocation

Yun Tang (Jira) Tue, 23 Nov 2021 18:51:28 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448334#comment-17448334
 ]


Yun Tang commented on FLINK-24815:
----------------------------------

Apart from calucating the state size lazily, I like the idea of store state 
size in metadata as we could also know the full state size and represent it in 
the UI (current state size calucated on JM side is actually the incremental 
state size).

> Reduce the cpu cost of calculating stateSize during state allocation
> --------------------------------------------------------------------
>
>                 Key: FLINK-24815
>                 URL: https://issues.apache.org/jira/browse/FLINK-24815
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / State Backends
>    Affects Versions: 1.14.0
>            Reporter: ming li
>            Priority: Major
>
> When the task failover, we will reassign the state for each subtask and 
> create a new {{OperatorSubtaskState}} object. At this time, the {{stateSize}} 
> field in the {{OperatorSubtaskState}} will be recalculated. When using 
> incremental {{{}Checkpoint{}}}, this field needs to traverse all shared 
> states and then accumulate the size of the state.
> Taking a job with 2000 parallelism and 100 share state for each task as an 
> example, it needs to traverse 2000 * 100 = 20w times. At this time, the cpu 
> of the JM scheduling thread will be full.
> I think we can try to provide a construction method with {{stateSize}} for 
> {{OperatorSubtaskState}} or delay the calculation of {{{}stateSize{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-24815) Reduce the cpu cost of calculating stateSize during state allocation

Reply via email to