[
https://issues.apache.org/jira/browse/FLINK-28838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577116#comment-17577116
]
Qingsheng Ren edited comment on FLINK-28838 at 8/9/22 3:30 AM:
---------------------------------------------------------------
Thanks for the ticket [~aitozi]! Yeah we can definitely make some improvement
as not all source implementations works as expected.
I think your first proposal make sense to me. We can drop empty records earlier
before putting into elementsQueue. I have some concerns about the second one
(adding SleepTask) as we can hardly decide the length of sleep considering
source implementations differ a lot. For example KafkaConsumer itself has
ability to block the thread if no data is available for polling so it doesn't
need the SleepTask at all. I prefer to leave it to split reader implementation
itself as the doc of {{SplitReader#fetch}} is quite clear that it could be a
blocking call. WDYT?
BTW which source has this issue? We can check its implementation too.
was (Author: renqs):
Thanks for the ticket [~aitozi]! Yeah we can definitely make some improvement
as not all source implementations works as expected.
I think your first proposal make sense to me. We can drop empty records earlier
before putting into elementsQueue. I have some concerns about the second one
(adding SleepTask) as we can hardly decide the length of sleep considering
source implementations vary a lot. For example KafkaConsumer itself has ability
to block the thread if no data is available for polling so it doesn't need the
SleepTask at all. I prefer to leave it to split reader implementation itself as
the doc of {{SplitReader#fetch}} is quite clear that it could be a blocking
call. WDYT?
BTW which source has this issue? We can check its implementation too.
> Avoid to notify the elementQueue consumer when the fetch result is empty
> ------------------------------------------------------------------------
>
> Key: FLINK-28838
> URL: https://issues.apache.org/jira/browse/FLINK-28838
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Common
> Affects Versions: 1.15.0, 1.15.1
> Reporter: Aitozi
> Priority: Major
> Fix For: 1.16.0
>
> Attachments: 20220805165441.jpg
>
>
> When using the new source api, I found that if the source has no data, it
> still brings high cpu usage.
> The reason behind this is that it will always return the
> {{RecordsWithSplitIds}} from the {{splitReader.fetch}} in FetchTask and it
> will be added to the elementQueue. It will make the consumer be notified to
> wake up frequently.
> This causes the thread to keep busy to run and wake up, which leads to the
> high sys and user cpu usage.
> I think not all the SplitReader#fetch will block until there is data, if it
> returns immediately when there is no data, then this problem will happen
--
This message was sent by Atlassian Jira
(v8.20.10#820010)