[
https://issues.apache.org/jira/browse/KUDU-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Song Jiacheng updated KUDU-3516:
--------------------------------
Description:
If we have many tables with many columns and each of them get many update
requests, the maintenance scheduler will be stuck in calculating the perf
improvement score of major compaction.
This tablet server has 6 maintenance manager but could only schedule 1 or 2
tasks at one time, even if the tablet server is actually under high memory
pressure.
!image-2023-10-09-15-59-01-026.png|width=655,height=327!
According to the stack showed below, I found out the scheduler was stuck in
AddColumnIdsWithUpdates for a long time, but there is no need to get all the
updated columns here.
!image-2023-10-09-15-58-47-267.png|width=690,height=218!
was:
If we have many tables with many columns and each of them get many update
requests, the maintenance scheduler will stuck in calculating the perf
improvement score of major compaction.
This tablet server has 6 maintenance manager but could only schedule 1 or 2
tasks at one time, even if the tablet server is actually under high memory
pressure.
!image-2023-10-09-15-59-01-026.png|width=655,height=327!
According to the stack showed below, I found out the scheduler stuck in
AddColumnIdsWithUpdates for a long time, but there is no need to get all the
updated columns here.
!image-2023-10-09-15-58-47-267.png|width=690,height=218!
> Tserver: Maintenance scheduler might be stuck in
> DeltaStats#AddColumnIdsWithUpdates
> ------------------------------------------------------------------------------------
>
> Key: KUDU-3516
> URL: https://issues.apache.org/jira/browse/KUDU-3516
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Reporter: Song Jiacheng
> Priority: Major
> Attachments: image-2023-10-09-15-58-47-267.png,
> image-2023-10-09-15-59-01-026.png
>
>
> If we have many tables with many columns and each of them get many update
> requests, the maintenance scheduler will be stuck in calculating the perf
> improvement score of major compaction.
> This tablet server has 6 maintenance manager but could only schedule 1 or 2
> tasks at one time, even if the tablet server is actually under high memory
> pressure.
> !image-2023-10-09-15-59-01-026.png|width=655,height=327!
> According to the stack showed below, I found out the scheduler was stuck in
> AddColumnIdsWithUpdates for a long time, but there is no need to get all the
> updated columns here.
> !image-2023-10-09-15-58-47-267.png|width=690,height=218!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)