[
https://issues.apache.org/jira/browse/FLINK-34567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yamanda updated FLINK-34567:
----------------------------
Description:
I deploy flink cluster (version: 1.16.2) and it run normally about 2 months,
but recently i meet a problem. I see some sub tasks back pressure is high and
the flink job is totally blocked(in pic1.jpg), these sub tasks are all in one
task manager. so i stop the abnormal task manager and deploy flink job again,
the problem is solved. I find some error log in the abnormal task manager:
2024-03-03 15:57:25,088 ERROR
org.apache.flink.runtime.io.network.netty.PartitionRequestQueue [] -
Encountered error while consuming partitions
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException:
readAddress(..) failed: Connection timed out
I check the abnormal task manager deployed machine. cpu, memory, network is as
normal as other task manager deployed machine, so it doesn't look like a
hardware problem.
What does it mean?
What should i do to solve this problem completely?
was:
I deploy flink cluster and it run normally about 2 months, but recently i meet
a problem. I see some sub tasks back pressure is high and the flink job is
totally blocked(in pic1.jpg), these sub tasks are all in one task manager. so i
stop the abnormal task manager and deploy flink job again, the problem is
solved. I find some error log in the abnormal task manager:
2024-03-03 15:57:25,088 ERROR
org.apache.flink.runtime.io.network.netty.PartitionRequestQueue [] -
Encountered error while consuming partitions
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException:
readAddress(..) failed: Connection timed out
What does it mean? What should i do to solve this problem completely?
> flink task manager error occur, msg: Encountered error while consuming
> partitions
> ---------------------------------------------------------------------------------
>
> Key: FLINK-34567
> URL: https://issues.apache.org/jira/browse/FLINK-34567
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.16.2
> Reporter: yamanda
> Priority: Major
> Labels: flink
> Attachments: pic1.jpg
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> I deploy flink cluster (version: 1.16.2) and it run normally about 2 months,
> but recently i meet a problem. I see some sub tasks back pressure is high and
> the flink job is totally blocked(in pic1.jpg), these sub tasks are all in one
> task manager. so i stop the abnormal task manager and deploy flink job again,
> the problem is solved. I find some error log in the abnormal task manager:
> 2024-03-03 15:57:25,088 ERROR
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue [] -
> Encountered error while consuming partitions
> org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException:
> readAddress(..) failed: Connection timed out
> I check the abnormal task manager deployed machine. cpu, memory, network is
> as normal as other task manager deployed machine, so it doesn't look like a
> hardware problem.
> What does it mean?
> What should i do to solve this problem completely?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)