[
https://issues.apache.org/jira/browse/FLINK-25328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606865#comment-17606865
]
Shammon commented on FLINK-25328:
---------------------------------
Hi [~xtsong] Sorry for not following this issue for a long time.
According to what we discussed above, I have wrote a design doc [Memory Manager
Pool
Design|https://docs.google.com/document/d/1y8XA7S0BK0-BfRuUp8atIutoHj-2z_5BIxTmL1EmLRM]
for this issue and wanted to implement this issue in the next weeks.
Could you help to review this doc and feel free to add comments on it, THX
> Improvement of reuse segments for join/agg/sort operators in TaskManager for
> flink olap queries
> -----------------------------------------------------------------------------------------------
>
> Key: FLINK-25328
> URL: https://issues.apache.org/jira/browse/FLINK-25328
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.14.0, 1.12.5, 1.13.3
> Reporter: Shammon
> Priority: Major
>
> We submit batch jobs to flink session cluster as olap queries, and these
> jobs' subtasks in TaskManager are frequently created and destroyed because
> they finish their work quickly. Each slot in taskmanager manages
> `MemoryManager` for multiple tasks in one job, and the `MemoryManager` is
> closed when all the subtasks are finished. Join/Aggregate/Sort and etc.
> operators in the subtasks allocate `MemorySegment` via `MemoryManager` and
> these `MemorySegment` will be free when they are finished.
>
> It causes too much memory allocation and free of `MemorySegment` in
> taskmanager. For example, a TaskManager contains 50 slots, one job has 3
> join/agg operatos run in the slot, each operator will allocate 2000 segments
> and initialize them. If the subtasks of a job take 100ms to execute, then the
> taskmanager will execute 10 jobs' subtasks one second and it will allocate
> and free 2000 * 3 * 50 * 10 = 300w segments for them. Allocate and free too
> many segments from memory will cause two issues:
> 1) Increases the CPU usage of taskmanager
> 2) Increase the cost of subtasks in taskmanager, which will increase the
> latency of job and decrease the qps.
> To improve the usage of memory segment between jobs in the same slot,
> we propose not drop memory manager when all the subtasks in the slot are
> finished. The slot will hold the `MemoryManager` and not free the allocated
> `MemorySegment` in it immediately. When some subtasks of another job are
> assigned to the slot, they don't need to allocate segments from memory and
> can reuse the `MemoryManager` and `MemorySegment` in it. WDYT? [~xtsong] THX
--
This message was sent by Atlassian Jira
(v8.20.10#820010)