[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151004#comment-15151004
 ] 

Jamie Grier commented on FLINK-1502:
------------------------------------

I understand [~eastcirclek]'s points about using the InstanceID.  This is a 
unique ID that is automatically generated (I believe).  As such if you use it 
to namespace the metrics you will see new metrics names whenever new 
TaskMangers are created.  Overtime this means the total # of metrics will grow 
and grow.  From my experience it would be better to have a "logical" ID for 
each TaskManager in the cluster.  Literally like (1, 2, 3, 4, etc) and use this 
value to namespace the metrics.  This will provide better continuity over time 
as TaskManagers come up and down.  However, I don't know if this concept 
actually exists inside Flink at the moment.  Does it?

I would suggest we use logical ids/indexes for TaskManager level metrics, as 
well as task level metrics, etc, as opposed to UUIDs.

So rather than:

    taskmanager.<TASK_MANAGER_UUID_1>.gc_time
    taskmanager.<TASK_MANAGER_UUID_2>.gc_time

and

    task.<TASK_UUID_1>.flatMap.messagesReceived
    task.<TASK_UUID_2>.flatMap.messagesReceived

I would suggest something like

    cluster.<CLUSTER_NAME>.taskmanager.1.gc_time
    cluster.<CLUSTER_NAME>.taskmanager.2.gc_time

and

    cluster.<CLUSTER_NAME>.task.1.flatMap.messagesReceived
    cluster.<CLUSTER_NAME>.task.2.flatMap.messagesReceived

I hope that makes sense.  The main point is to use Logical ID's wherever 
possible, especially for things that change otherwise there will be a lack of 
continuity in the metrics.  Also I don't know that we actually have the 
CLUSTER_NAME concept right now either but we might need this.  This would be 
unique for any given YarnSession if running on YARN for example.  Basically we 
just need some way to group a set of TaskManagers uniquely.  I guess this could 
also be done by using the UUID of the JobManager.

Comments?

> Expose metrics to graphite, ganglia and JMX.
> --------------------------------------------
>
>                 Key: FLINK-1502
>                 URL: https://issues.apache.org/jira/browse/FLINK-1502
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Dongwon Kim
>            Priority: Minor
>             Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to