[ 
https://issues.apache.org/jira/browse/FLINK-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010320#comment-17010320
 ] 

Zhu Zhu commented on FLINK-15448:
---------------------------------

[~xintongsong] agreed that we should take a lot care when making changes on the 
IDs. 
However, since it is not user visible, maybe a FLIP is not really needed. And 
in my mind these changes can be separate pieces. For example,  
ExecutionAttemptID and IntermediateResultPartitionID are not much related to 
the ResourceID changes.
Maybe we can have a umbrella ticket to track all these tasks. Each task then 
should be responsible to carefully design the changes on the ID it is to 
improve. And each task can have a separate ML discussion when necessary.

> Log host informations for TaskManager failures.
> -----------------------------------------------
>
>                 Key: FLINK-15448
>                 URL: https://issues.apache.org/jira/browse/FLINK-15448
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.1
>            Reporter: Victor Wong
>            Assignee: Victor Wong
>            Priority: Minor
>
> With Flink on Yarn, sometimes we ran into an exception like this:
> {code:java}
> java.util.concurrent.TimeoutException: The heartbeat of TaskManager with id 
> container_xxxx  timed out.
> {code}
> We'd like to find out the host of the lost TaskManager to log into it for 
> more details, we have to check the previous logs for the host information, 
> which is a little time-consuming.
> Maybe we can add more descriptive information to ResourceID of Yarn 
> containers, e.g. "container_xxx@host_name:port_number".
> Here's the demo:
> {code:java}
> class ResourceID {
>   final String resourceId;
>   final String details;
>   public ResourceID(String resourceId) {
>     this.resourceId = resourceId;
>     this.details = resourceId;
>   }
>   public ResourceID(String resourceId, String details) {
>     this.resourceId = resourceId;
>     this.details = details;
>   }
>   public String toString() {
>     return details;
>   }     
> }
> // in flink-yarn
> private void startTaskExecutorInContainer(Container container) {
>   final String containerIdStr = container.getId().toString();
>   final String containerDetail = container.getId() + "@" + 
> container.getNodeId();  
>   final ResourceID resourceId = new ResourceID(containerIdStr, 
> containerDetail);
>   ...
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to