[ 
https://issues.apache.org/jira/browse/FLINK-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006609#comment-17006609
 ] 

Victor Wong commented on FLINK-15448:
-------------------------------------

[~xintongsong], thanks for your reply, I think your concern is very reasonable, 
but I still have some questions.

_the right approach should be providing more information at the places where 
the logs are generated_
---
The information we need, like the host of TM, is not always available or 
convenient to access in some place. Take `xxxxHeartbeatListener` for example: 

{code:java}
// 
org.apache.flink.runtime.jobmaster.JobMaster.TaskManagerHeartbeatListener#notifyHeartbeatTimeout
                public void notifyHeartbeatTimeout(ResourceID resourceID) {
                        validateRunsInMainThread();
                          // *I think it's not easy to construct a correct log 
information here*.
                        disconnectTaskManager(
                                resourceID,
                                new TimeoutException("Heartbeat of TaskManager 
with id " + resourceID + " timed out."));
                }
{code}

Besides, I think it's error-prone to keep in mind providing the exact needed 
information when logging.

What about initialize Yarn ResourceID with both container and host information, 
i.e. `new ResourceID(container.getId().toString() + "@" + 
container.getNodeId())`. Any suggestion?

> Make "ResourceID#toString" more descriptive
> -------------------------------------------
>
>                 Key: FLINK-15448
>                 URL: https://issues.apache.org/jira/browse/FLINK-15448
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.9.1
>            Reporter: Victor Wong
>            Priority: Major
>
> With Flink on Yarn, sometimes we ran into an exception like this:
> {code:java}
> java.util.concurrent.TimeoutException: The heartbeat of TaskManager with id 
> container_xxxx  timed out.
> {code}
> We'd like to find out the host of the lost TaskManager to log into it for 
> more details, we have to check the previous logs for the host information, 
> which is a little time-consuming.
> Maybe we can add more descriptive information to ResourceID of Yarn 
> containers, e.g. "container_xxx@host_name:port_number".
> Here's the demo:
> {code:java}
> class ResourceID {
>   final String resourceId;
>   final String details;
>   public ResourceID(String resourceId) {
>     this.resourceId = resourceId;
>     this.details = resourceId;
>   }
>   public ResourceID(String resourceId, String details) {
>     this.resourceId = resourceId;
>     this.details = details;
>   }
>   public String toString() {
>     return details;
>   }     
> }
> // in flink-yarn
> private void startTaskExecutorInContainer(Container container) {
>   final String containerIdStr = container.getId().toString();
>   final String containerDetail = container.getId() + "@" + 
> container.getNodeId();  
>   final ResourceID resourceId = new ResourceID(containerIdStr, 
> containerDetail);
>   ...
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to