[
https://issues.apache.org/jira/browse/FLINK-32127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724083#comment-17724083
]
Wencong Liu commented on FLINK-32127:
-------------------------------------
Hi [~Zhanghao Chen] , Thanks for the proposal! I guess the key point of the
issue is that some operations consuming CPU (compress/uncompress) should be
regarded as a part of data processing in task executor. Only in this way, the
busy percent of source task will be accurate. Therefore, I think another key
point is that how to unify the busy percent computation logic for both source
task and non-source task. đŸ¤”
> Source busy time is inaccurate in many cases
> --------------------------------------------
>
> Key: FLINK-32127
> URL: https://issues.apache.org/jira/browse/FLINK-32127
> Project: Flink
> Issue Type: Improvement
> Components: Autoscaler
> Reporter: Zhanghao Chen
> Priority: Major
>
> We found that source busy time is inaccurate in many cases. The reason is
> that sources are usu. multi-threaded (Kafka and RocketMq for example), there
> is a fetcher thread fetching data from data source, and a consumer thread
> deserializes data with an blocking queue in between. A source is consideredÂ
> # *idle* if the consumer is blocked by fetching data from the queue
> # *backpressured* if the consumer is blocked by writing data to downstream
> operators
> # *busy* otherwise
> However, this means that if the bottleneck is on the fetcher side, the
> consumer will be often blocked by fetching data from the queue, the source
> idle time would be high, but in fact it is busy and consumes a lot of CPU. In
> some of our jobs, the source max busy time is only ~600 ms while it has
> actually reached the limit.
> The bottleneck could be on the fetcher side, for example, when Kafka enables
> zstd compression, uncompression on the consumer side could be quite heavy
> compared to data deserialization on the consumer thread side.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)