[ 
https://issues.apache.org/jira/browse/FLINK-34567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yamanda updated FLINK-34567:
----------------------------
    Description: 
I deploy flink cluster (version: 1.16.2) and it run normally about 2 months, 
but recently i meet a problem. I see some sub tasks back pressure is high and 
the flink job is totally blocked(in pic1.jpg), these sub tasks are all in one 
task manager. so i stop the abnormal task manager and deploy flink job again, 
the problem is solved. I find some error log in the abnormal task manager:

2024-03-03 15:57:25,088 ERROR 
org.apache.flink.runtime.io.network.netty.PartitionRequestQueue [] - 
Encountered error while consuming partitions
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: 
readAddress(..) failed: Connection timed out

I check the abnormal task manager deployed machine. cpu, memory, network is as 
normal as other task manager deployed machine, so it doesn't look like a 
hardware problem.

What does it mean?

What should i do to solve this problem completely?

  was:
I deploy flink cluster and it run normally about 2 months, but recently i meet 
a problem. I see some sub tasks back pressure is high and the flink job is 
totally blocked(in pic1.jpg), these sub tasks are all in one task manager. so i 
stop the abnormal task manager and deploy flink job again, the problem is 
solved. I find some error log in the abnormal task manager:

2024-03-03 15:57:25,088 ERROR 
org.apache.flink.runtime.io.network.netty.PartitionRequestQueue [] - 
Encountered error while consuming partitions
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: 
readAddress(..) failed: Connection timed out

What does it mean? What should i do to solve this problem completely?


> flink task manager error occur, msg: Encountered error while consuming 
> partitions
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-34567
>                 URL: https://issues.apache.org/jira/browse/FLINK-34567
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.16.2
>            Reporter: yamanda
>            Priority: Major
>              Labels: flink
>         Attachments: pic1.jpg
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I deploy flink cluster (version: 1.16.2) and it run normally about 2 months, 
> but recently i meet a problem. I see some sub tasks back pressure is high and 
> the flink job is totally blocked(in pic1.jpg), these sub tasks are all in one 
> task manager. so i stop the abnormal task manager and deploy flink job again, 
> the problem is solved. I find some error log in the abnormal task manager:
> 2024-03-03 15:57:25,088 ERROR 
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue [] - 
> Encountered error while consuming partitions
> org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException:
>  readAddress(..) failed: Connection timed out
> I check the abnormal task manager deployed machine. cpu, memory, network is 
> as normal as other task manager deployed machine, so it doesn't look like a 
> hardware problem.
> What does it mean?
> What should i do to solve this problem completely?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to