[
https://issues.apache.org/jira/browse/FLINK-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323453#comment-17323453
]
Flink Jira Bot commented on FLINK-10576:
----------------------------------------
This issue is assigned but has not received an update in 7 days so it has been
labeled "stale-assigned". If you are still working on the issue, please give an
update and remove the label. If you are no longer working on the issue, please
unassign so someone else may work on it. In 7 days the issue will be
automatically unassigned.
> Introduce Machine/Node/TM health management
> -------------------------------------------
>
> Key: FLINK-10576
> URL: https://issues.apache.org/jira/browse/FLINK-10576
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Coordination
> Reporter: JIN SUN
> Assignee: ryantaocer
> Priority: Major
> Labels: stale-assigned
>
> When a task failed we can identify whether it was due to environment issues,
> especially when multiple tasks report environment error from some
> TM/Machine/Node, there are high possibility that this TM has issue, and if we
> found multiple tasks became slow in some certain node, we should put the
> machine into probation.
> * we should avoid schedule new task to it
> * release the task manager when all tasks are drained and allocated new one
> if needed
--
This message was sent by Atlassian Jira
(v8.3.4#803005)